Cookpad is looking for engineers to join our Site Reliability Engineering team. Site Reliability Engineers are a hybrid between system engineers and software engineers who are responsible for and who take ownership of reliability, automation, and scalability. You will focus on the systems and tools that enable our engineers to operate and scale the largest recipe sharing community in the world.

WHAT YOU’LL DO

As an SRE, you will build high performance and scalable systems with AWS and software. You will also work closely with engineers to advocate sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. In the case of incidents, you will triage, mitigate and solve them with product team engineers.

Desired Skills and Experience

  • Build highly available, performant and scalable service infrastructure with AWS
  • Design, develop and implement software that improves the stability, scalability, availability and latency of Cookpad.
  • Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again
  • Participate in the operations on-call rotation, triaging and addressing production issues as they arise
  • Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems
  • Engage with product engineering teams to triage production outages and carry forward action items to improve ongoing reliability
  • Undertake measured, methodical, troubleshooting of complicated systems under pressure
  • You are expected to have strong written communication skills in English and be able to develop working relationships with coworkers in locations around the globe
  • You possess fundamentals of TCP/IP(OSI) model and network architectures
  • A solid foundation in deployment and management for large scale of Linux systems
  •  Strong coding skills in at least one programming language. Cookpad server side engineers work primarily in Ruby, with smatterings of shell script, Go, and Python
  • Familiar with configuration management software such as Puppet and Chef
  • You will understand large-scale complex systems from a reliability perspective
  • Possess a passion for solving problems using open source software
  • Experience with Cloud Computing platforms (particularly AWS) is a plus
  • Experience working professionally with Ruby on Rails
  • Solid competency with SQL (ideally in a federated database environment; MySQL a plus)
  • Contributions to open source
  • Deep network analysis experience is a plus
  • Strong Linux system-level analysis capabilities (Ubuntu a plus)
  • Knowledge and experiences about highly available and scalable architectures for services expanded in multi-regions is a big plus