Viridien (www.viridiengroup.com) is an advanced technology, digital and Earth data company that pushes the boundaries of science for a more prosperous and sustainable future. With our ingenuity, drive and deep curiosity we discover new insights, innovations, and solutions that efficiently and responsibly resolve complex natural resource, digital, energy transition and infrastructure challenges.
Job Summary:
We are seeking a highly experienced and skilled Senior Linux Administrator to join our IT team, with a focus on High-Performance Computing (HPC) and Cloud infrastructure. The successful candidate will have a proven and demonstrable track record in Linux administration, with a strong understanding of system administration, troubleshooting, and IT service management.
Key Responsibilities:
Install, configure, maintain, and repair Linux-based hardware and software, ensuring high functionality, performance, and reliability.
Test, verify, and deploy large-scale service upgrades, and fixes for existing and new services.
Provide expert-level technical support to users and junior team members, including mentoring and training to continuously enhance skills and knowledge.
Respond to escalated support tickets, troubleshoot complex issues, and resolve system crashes, performance degradation, and security breaches.
Monitor systems and services through performance monitoring, log analysis, and proactive issue detection.
Continuously improve services and processes through automation, optimization, and standardization, while staying updated on technology trends, regulations, and best practices.
Configure and manage HPC job schedulers, troubleshoot job scheduling issues, and support high-performance computing environments.
Align IT service delivery with customer and business needs.
Ensure services comply with company and IT policies and standards.
Contribute to the development and maintenance of comprehensive technical documentation, including system diagrams, configuration files, and troubleshooting guides, to support knowledge sharing, operational efficiency, and continuous process improvement.
Demonstrate strong organizational and time management skills, with the ability to prioritize tasks, manage multiple projects, and communicate complex technical information effectively to both senior management and technical teams.
Learn and support in-house software applications.
Contribute to the design and architecture of Linux-based systems and infrastructure, ensuring high availability, scalability, and performance, and aligning with the organization's technical vision and strategy.
Participate in the evaluation and recommendation of new technologies and solutions and collaborate with stakeholders to develop and implement strategic plans for infrastructure growth and development.
Skills & Competencies:
Essential:
5+ years of demonstrable experience in Linux-based server and network administration, preferably in an HPC environment.
Strong understanding of system administration, troubleshooting, and IT service management.
Experience with automation/configuration management using either Puppet, Chef, Salt, Ansible, Gitlab, or an equivalent.
Ability to use a wide variety of open-source technologies.
Experience with virtualization, containers, and orchestration tools such as Kubernetes, ProxMox, Docker, Podman, and similar.
Familiarity with code and script (Bash, Python, Perl); shell scripting.
Excellent troubleshooting and problem-solving skills.
Desirable:
Experience with OpenStack and other cloud infrastructure platforms.
Experience with DevOps methodologies and tools such as CI/CD, etc.
Experience in cloud administration, virtualization, and hardware maintenance (Storage/CPU/GPU).
Relevant industry certifications, such as CCNA, CKA, RHCE, or other related technology certifications.
ITIL Foundation level certification.
Familiarity with GPU technology on Linux platforms, including CUDA, OpenCL, ROCM, and GPU drivers, with a focus on parallel processing, firmware management, and performance optimization.
Experience with High Performance Computing (HPC) and clustering technology (object storage, parallel file systems, RAID storage, tape subsystems).
Experience in a high-volume critical production service environment.
Qualifications and Experience:
Bachelor's degree in IT, Computer Science, Computer Engineering, or a related field (or equivalent work experience).
Demonstrable experience in scripting and automation, cloud technologies, and DevOps practices is preferred.
Knowledge of internet security and data privacy principles.
Excellent communications, presentation, and customer service skills, and must have an outstanding track record of meeting customer expectations.
Must be detail-oriented and work well in a team environment.
Must have legal right to live and work in the United States.
We see things differently. Diversity fuels our innovation, we value the unique ways in which we differ, and we are committed to equal employment opportunities for all professionals.