As a Site Reliability Engineer (SRE) in Twitter’s Real Time Storage team, you will work to improve the reliability and performance of the next-generation of distributed systems. You will partner with our product engineering teams to design, build, operate, and automate distributed storage systems at the heart of Twitter’s infrastructure that are used by millions of people.
We are looking for software engineers that are passionate about reliability, performance, and efficiency, and that have experience building tools, services, and automation to manage and improve production services.
– Build tooling to improve the automation of operations. This includes automatic failure detection and remediation, application deployment, OS/Kernel/JVM/Firmware deployment, capacity planning, and fleet management.
– Diagnose, and troubleshoot complex distributed systems handling millions of queries per second, petabytes of data, and develop solutions that have a significant impact at our massive scale.
– Collaborate with software engineers to sustain and optimize service availability, reliability, and performance.
– Work and collaborate with the diverse hardware, software and networking teams throughout the company to design next-generation distributed storage platforms.
– Troubleshoot issues across the entire stack – hardware, software, application and network.
– Sustain data privacy and service security compliance.
– Participate in a 24×7 on-call rotation
To apply for this job please visit itjobpro.co.uk.