Desired Skills and Experience
- Build highly available, performant and scalable service infrastructure with AWS
- Design, develop and implement software that improves the stability, scalability, availability and latency of Cookpad.
- Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems
- Engage with product engineering teams to triage production outages and carry forward action items to improve ongoing reliability
- Undertake measured, methodical, troubleshooting of complicated systems under pressure
- 3+ years Site Reliability Engineer/DevOps experience in a Linux based AWS environment
- 2+ years experience with working professionally with Ruby on Rails
- Strong written communication skills in English and develop working relationships with coworkers in locations around the globe
- Fundamentals of TCP/IP(OSI) model and network architectures
- Strong coding skills in at least one programming language. Cookpad server side engineers work primarily in Ruby, with smatterings of shell script, Go, and Python
- Familiar with configuration management software such as Puppet and Chef
- Possess a passion for solving problems using open source software
- Solid foundation in deployment and management for large scale of Linux systems
- Understand large-scale complex systems from a reliability perspective
- Solid competency with SQL (ideally in a federated database environment; MySQL a plus)
- Contributions to open source
- Deep network analysis experience is a plus
- Strong Linux system-level analysis capabilities (Ubuntu a plus)
- Knowledge and experiences about highly available and scalable architectures for services expanded in multi-regions is a big plus
Apply