We’re looking for a seasoned Site Reliability Engineer to ensure RealSelf is fast, reliable, and available for millions of users. You will work in a close-knit team to proactively monitor and improve end-to-end system performance, identify bottlenecks, and potential failures throughout our infrastructure. Responsibilities:

  • Ensure our platform exceeds goals for availability, capacity, efficiency, scalability, and performance
  • Drive the company through “Disaster Recovery Tests”, where we manually turn down pieces of infrastructure to test RealSelf overall resiliency to failures
  • Proactively monitor and improve end-to-end system performance, identify bottlenecks, and potential failures
  • Develop and maintain scalable alerting tools for debugging and monitoring
  • Manage the system and processes that RealSelf engineers use to package, test, and deploy their software

Desired Skills and Experience

See application page for details