Why is Site Reliability Engineering important at Airbnb? Site reliability engineers (SREs) are responsible for the overall reliability of Airbnb infrastructure and products. SREs design and implement the tools that automate building reliable systems. What are some examples of Site Reliability Engineering work at Airbnb? Work with software engineering teams on design and implementation choices of large scale distributed systems Automate as much as humanly possible Always configure as code Bring ideas to life (i.e. production) Figure out what is going to break and when Advocate for reliable design patterns (circuit breakers, graceful degradation, etc.) Some examples of SRE projects are: A web interface to launch EC2 instances from chef roles (replacing our open-sourced stemcell ) Standardizing core infrastructure components (MySQL, DropWizard, Elasticsearch, etc.) so they have best practices (monitoring, altering, etc.) built in A custom deploy system, called Deployboard An alerts configuration tool for DataDog Optica , a tool for keeping track of nodes in an infrastructure The following experience is relevant to us: Experience bringing software to production at high scale The knack for writing, clean, readable, maintainable code An eye for automation and instrumentation The ability to decompose complex systems and find failure scenarios Great communication skills Knowledge of AWS services Contributions to open source software

Desired Skills and Experience

See application page for details