Site Reliability Engineer (SRE)

Viewed 0 times

Job Description

Posted 4 hours ago

Job Title: Lead Site Reliability Engineer (SRE) – Observability

Location: Reading, UK / Hybrid & Remote Options

About the Role

We are looking for a Lead SRE to design, scale, and operate massive-scale observability systems that keep our global services online and performant. You will join an autonomous team of software engineers focused on solving complex data infrastructure challenges.

Key Responsibilities

Scale Prometheus metrics infrastructure to handle 100+ million active series.

Operate large Elasticsearch clusters holding 2000+TB of data.

Grow high-throughput Kafka data pipelines processing hundreds of thousands of events per second.

Build custom alerting workflows and self-service APIs for internal engineering teams.

Provision cloud and private infrastructure using Terraform.

Requirements

5+ years operating mid-to-large distributed systems on Linux VMs or bare-metal machines.

2+ years developing in Go, Python, Ruby, Scala, or Bash.

Hands-on experience with Prometheus/Thanos/Cortex, Kafka, the ELK stack, Ansible, or Consul.

Comfortable diving into unfamiliar codebases and participating in an on-call rotation.

Keywords: Observability, Monitoring, SRE, Site Reliability Engineering, DevOps, ElasticSearch, ELK, Prometheus, Kafka, Terraform, Linux, Bare Metal

Randstad Technologies is acting as an Employment Business in relation to this vacancy

Job Summary

Similar Jobs

The largest community on the web to find and list jobs that aren't restricted by commutes or a specific location.

Embedded Software Engineer

IT Job Pro
Location
Cambridge
Site Reliability Engineer (SRE)

IT Job Pro
Embedded Software Engineer

IT Job Pro
Location
Cambridge

Login to IT Job Pro

Create a free Front account

Recover password

Site Reliability Engineer (SRE)

Job Description

Similar Jobs

Embedded Software Engineer

Location

Site Reliability Engineer (SRE)

Embedded Software Engineer

Location