Site Reliability Engineer (SRE)

Related keywords: remote job internationalremote job with flexible hoursremote job flexible hours

Job Overview

Blackfluo.ai is seeking a Site Reliability Engineer (SRE) to join their dynamic and international engineering team. This role is fully remote, with a focus on candidates located within the EU timezone (CET +/- 2 hours). The successful candidate will need to start as soon as possible and should be proficient in English.

Key Responsibilities

As a Site Reliability Engineer, your primary responsibilities will include:

  • Designing, implementing, and maintaining scalable and resilient AWS infrastructure.
  • Developing and managing CI/CD pipelines and utilizing infrastructure-as-code principles, which may involve tools like Terraform or similar.
  • Setting up and optimizing monitoring, alerting, and incident response processes to enhance overall system performance.
  • Proactively identifying and resolving issues related to performance, reliability, and security within the infrastructure.
  • Collaborating with development teams to integrate SRE best practices into their workflows, ensuring a seamless operation.
  • Conducting post-mortems and root cause analyses on incidents to improve future reliability and performance.
  • Participating in on-call rotations to ensure 24/7 system reliability, demonstrating a hands-on approach to maintaining the systems.

Required Skills and Qualifications

Candidates should possess the following skills and experience:

  • A minimum of 5 years experience in a Site Reliability Engineer or similar role.
  • Deep knowledge of AWS services such as EC2, ECS, RDS, Lambda, and S3.
  • Proficiency in infrastructure-as-code tools like Terraform and CloudFormation.
  • Solid experience in Linux systems administration and a good understanding of networking concepts.
  • Strong programming or scripting skills in languages like Python, Bash, or Go.
  • Experience with CI/CD tools such as GitLab CI or Jenkins.
  • Familiarity with observability tools like Prometheus, Grafana, or Datadog.

In addition to these requirements, the following skills are considered a plus:

  • Experience with container orchestration tools (e.g., ECS, EKS, or Kubernetes).
  • Understanding of security best practices for cloud environments.
  • Exposure to incident management frameworks, particularly the SRE handbook.

Benefits of Joining Blackfluo.ai

Joining Blackfluo.ai as an SRE offers several advantages:

  • The opportunity for 100% remote work with flexible hours, allowing you to maintain a healthy work-life balance.
  • A high-impact role where you will have autonomy and ownership over your work.
  • The chance to work within a collaborative and international engineering team, gaining insights from various perspectives.
  • Exposure to a cutting-edge tech stack with an emphasis on reliability and automation, ensuring that you remain at the forefront of technological advancements.

Conclusion

This position represents a fantastic opportunity for experienced Site Reliability Engineers looking to contribute significantly while working in a flexible remote capacity. Interested candidates are encouraged to apply promptly to maximize their chances in this competitive field.



This job offer was originally published on himalayas.app

Blackfluo.ai

Full remote, EU timezone (CET +/- 2 hours)

Software development

Full-time

December 30, 2025

22 views

1 clicks on Apply Now


Similar job offers


This job offer summary has been generated using automated technology. While we strive for accuracy, it may not always fully capture the nuances and details of the original job posting. We recommend reviewing the complete job listing before making any decisions or applications.