Senior Site Reliability Engineer at Cookpad International Ltd (Bristol, UK)
Cookpad is looking for engineers to join our Site Reliability Engineering team. Site Reliability Engineers are a hybrid between system engineers and software engineers who are responsible for and who take ownership of reliability, automation, and scalability. You will focus on the systems and tools that enable our engineers to operate and scale the largest recipe sharing community in the world.
As an SRE, you will build high performance and scalable systems with AWS and software. You will also work closely with engineers to advocate sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. In the case of incidents, you will triage, mitigate and solve them with product team engineers.
WHAT YOU’LL DO
Desired Skills and Experience
- Build highly available, performant and scalable service infrastructure with AWS and software
- Design, develop and implement software that improves the stability, scalability, availability and latency of Cookpad.
- Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems
- Engage with product engineering teams to triage production outages and carry forward action items to improve ongoing reliability
- Undertake measured, methodical, troubleshooting of complicated systems under pressure
- 3+ years SRE/DevOps experience in a Linux based AWS environment
- 2+ years experience with working professionally with Ruby on Rails
- Strong written communication skills in English and develop working relationships with coworkers in locations around the globe
- Fundamentals of TCP/IP(OSI) model and network architectures
- Strong coding skills in at least one programming language. Cookpad server side engineers work primarily in Ruby, with smatterings of shell script, Go, and Python
- You are familiar with configuration management software such as Puppet and Chef
- You possess a passion for solving problems using open source software
- Solid foundation in deployment and management for large scale of Linux systems
- Understand large-scale complex systems from a reliability perspective
- Solid competency with SQL (ideally in a federated database environment; MySQL a plus)
- Contributions to open source
- Deep network analysis experience is a plus
- Strong Linux system-level analysis capabilities (Ubuntu a plus)
- Knowledge and experiences about highly available and scalable architectures for services expanded in multi-regions is a big plus