Job description for HPC System Engineer at Techdirect
Responsible for operation, administration, maintenance, troubleshooting, and optimization of NSCC's High Performance Computing (HPC) infrastructure including Linux servers, compute nodes, login nodes, HCI platforms, virtualization systems, monitoring, backups, patching, and security hardening.
Key Responsibilities
Linux & HPC Infrastructure Administration
Administer Red Hat Enterprise Linux (RHEL) based HPC environments
Manage HPC compute nodes and login nodes
Create and maintain golden OS images
Perform server provisioning and decommissioning
Manage kernel and driver upgrades
Conduct system health checks and performance monitoring
Virtualization & HCI
Support HCI clusters and virtualization platforms
VM lifecycle management
Template creation and maintenance
Backup and restore validation
Disaster recovery testing
Configuration & Change Management
Maintain system configuration baselines
Detect and remediate configuration drift
Execute approved change requests
Produce operational documentation and runbooks
Security & Compliance
Linux hardening
Vulnerability remediation
Patch management
Certificate and credential management
Audit support and evidence collection
Operations Support
Incident troubleshooting and resolution
Root Cause Analysis (RCA)
Performance tuning
Capacity management
Vendor escalation management
Mandatory Skills
Operating Systems
Red Hat Enterprise Linux (RHEL)
Rocky Linux / AlmaLinux
Ubuntu Linux
Virtualization
One or more:
VMware vSphere
Nutanix AHV
OpenShift Virtualization
KVM
Scripting & Automation
Bash
Python
Ansible
Git
Monitoring
Grafana
Prometheus
Zabbix
ELK
Splunk
Storage Knowledge
NFS
POSIX Filesystems
SAN/NAS concepts
Networking Knowledge
TCP/IP
DNS
VLAN
Linux networking
Experience Requirements
Minimum 3 years Linux Systems Administration experience
Experience supporting mission-critical environments
Experience with virtualization technologies
Experience with automation tools
Experience supporting large-scale infrastructure
Certifications
Mandatory:
ITIL Foundation
RHCSA (Red Hat Certified System Administrator)
Preferred:
RHCE
VMware VCP
Red Hat OpenShift Certification
Nutanix NCP
