HPC Operations Engineer

This post contains product affiliate links. We may receive a commission from Amazon if you make a purchase after clicking on one of these links. You will not incur any additional costs by clicking these links

Join CoreWeave's HPC Operations team as a remote problem-solver ensuring uptime and efficiency of our high-performance supercomputing clusters.

About the Role at CoreWeave

CoreWeave is expanding its High Performance Computing Operations team and is on the lookout for dedicated individuals prepared to work a schedule starting from 10am to 7pm PST, with potential for later hours. This role can be performed entirely remotely.

Team Responsibilities

The team’s duties revolve around the daily provisioning, management, and ensuring the uptime of CoreWeave's extensive fleet of server nodes. As CoreWeave grows, the team plays a pivotal role in handling the supercomputing clusters, which include configuring, updating, and remote troubleshooting. The goal is to ensure the maximum availability of nodes for customer use.

Job Functions

Applicants should be geared towards working through provisioning and validation processes for server nodes, and be adept at troubleshooting. This role involves aggressive problem-solving to enhance system performance and health, contributing to documentation, and improving team processes. You will work with a group of skilled engineers dedicated to rapidly deploying nodes.

Key Responsibilities

Key responsibilities include installing and maintaining high-performance computing clusters, hardware and software troubleshooting, performance monitoring, flexible and optimistic problem-solving, and maintaining operational documentation.

Candidate Profile

Ideal candidates have at least two years of experience with data center infrastructure, a strong grasp of Linux systems, and network administration. They should be consistent and reliable in their troubleshooting and system maintenance tasks. Additional experience with scripting, observability platforms, and Kubernetes is a plus.

Compensation Details

CoreWeave offers a competitive salary ranging from $80,000 to $110,000 per year, variable by geographic market costs and based on the applicant's experience, skills, and knowledge. The company is located in Las Vegas, Nevada, but the position is open to remote candidates across the United States.

Conclusion

CoreWeave is seeking driven problem solvers passionate about managing and troubleshooting high-performance computing clusters in a remote role with flexible hours geared towards Pacific Standard Time.



This job offer was originally published on RemoteOK

Resources

At Smart Remote Jobs, we understand the importance of equipping remote workers with the tools they need to thrive in their roles. To enhance your remote work experience, we recommend considering an ergonomic keyboard that can significantly improve comfort during long hours of typing.

Then, you'd better block out distractions with noise-cancelling headphones, ensuring clear communication during virtual meetings and enhanced concentration.

Finally, if you want to increase flexibility and promote better posture, you should use an adjustable standing desk, allowing you to alternate between sitting and standing positions throughout the day for improved health and productivity.

CoreWeave

Las Vegas, Nevada

Operations

Full-time

March 7, 2024

6 views

0 clicks on Apply Now


Similar job offers


This job offer summary has been generated using automated technology. While we strive for accuracy, it may not always fully capture the nuances and details of the original job posting. We recommend reviewing the complete job listing before making any decisions or applications.