Site Reliability Engineer

Related keywords: remote job germanyremote job franceremote job spain

Company Overview

Mistral AI is a dynamic company driven by the purpose of harnessing the potential of AI to simplify daily tasks, save time, and boost creativity. The organization focuses on creating high-performance, open-source models and solutions that integrate seamlessly into everyday operations. With teams spread out across the USA, UK, France, Germany, and Singapore, Mistral AI thrives in a competitive culture characterized by collaboration and a commitment to innovation.

Job Title and Role Summary

The position being offered is for a Site Reliability Engineer (SRE), aimed at individuals possessing significant experience in balancing operational tasks and long-term software engineering practices. The primary goal of the role is to increase the reliability, scalability, and performance of Mistral’s platforms and applications that face customers. It is intended for talented Engineers willing to undertake challenges that involve continuous improvement and developing solutions for optimizing infrastructure and operational processes.

Key Responsibilities

As a Site Reliability Engineer, candidates will engage in both operations (50%) and development (50%) tasks. Below are the main responsibilities associated with this role:

Operations Tasks

  • Design, build, and maintain infrastructures that support web services and machine learning workloads, ensuring they are scalable and fault-tolerant.
  • Ensure high availability and seamless replication of working environments across several high-performance computing (HPC) clusters.
  • Operate production environments while troubleshooting issues, responding to incidents, and performing user administration.
  • Improve monitoring, alerting, and incident response systems to minimize downtime and enhance system performance.
  • Participate in on-call rotations to respond to incidents, conducting root cause analysis as needed.

Development Tasks

  • Drive improvements in infrastructure automation and deployment practices through tools like Kubernetes, Flux, and Terraform.
  • Collaborate with AI/ML teams to develop solutions for secure and reproducible model-training experiments.
  • Build a cloud-agnostic platform that connects research and infrastructure effectively.
  • Design and enhance workflows and tooling to improve system reliability, availability, and performance.
  • Document processes and ensure knowledge sharing within the team.
  • Contribute to open-source projects and collaborate on research publications.

Required Skills and Qualifications

Candidates must bring specific skills and qualifications to be considered:

  • A Master’s degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in a DevOps/SRE role.
  • Strong experience with cloud computing and distributed system architectures.
  • Familiarity with site reliability and incident response in critical environments.
  • Hands-on experience with CI/CD, containerization, and orchestration tools (like Docker and Kubernetes).
  • Knowledge in monitoring, logging, and observability tools such as Prometheus, Grafana, and Datadog.
  • Proficiency in scripting languages (Python, Go, Bash) and familiarity with infrastructure-as-code tools (like Terraform).
  • Strong networking, security, and system administration knowledge.
  • Excellent problem-solving abilities
  • Self-motivated and suited to work in a fast-paced startup environment.

Additional beneficial experience includes exposure to an AI/ML environment, familiarity with HPC systems, and a background working with modern AI solutions like Fluidstack, Coreweave, or Vast.

Work Environment and Location

This role primarily supports remote work options, focusing on locations within Europe. Mistral AI encourages candidates from countries such as France, UK, Germany, Belgium, Netherlands, Spain, and Italy to apply. However, the company prefers candidates who either reside in Paris or are open to relocation.

New hires are required to visit the Paris office for their onboarding week, after which candidates are expected to work in person for at least three days per month.

Salary and Benefits

While specific salary details were not disclosed, the role offers competitive financial compensation and equity, along with additional benefits that include: Health insurance, transportation allowance, sport allowance, meal vouchers, a private pension plan, and a generous parental leave policy. Furthermore, the company offers visa sponsorship for eligible candidates seeking relocation to work in the mentioned regions.

Conclusion

This position as a Site Reliability Engineer at Mistral AI is an excellent opportunity for experienced professionals ready to make a significant impact in a pioneering company focused on the future of AI. The role promotes a collaborative environment that values creativity and innovation while providing remote flexibility across Europe.



This job offer was originally published on himalayas.app

Mistral AI

United Kingdom

Software development

Full-time

February 17, 2026

8 views

0 clicks on Apply Now


Similar job offers


This job offer summary has been generated using automated technology. While we strive for accuracy, it may not always fully capture the nuances and details of the original job posting. We recommend reviewing the complete job listing before making any decisions or applications.