Hulu is looking for Senior Site Reliability Engineer to help seed our new team.  As a Senior Site Reliability Engineer, you will help work very closely with development teams to identify manual processes that are good candidates for automation and developing that automation.  You will help improve the signal to noise ratio of our monitoring and alerting systems and make sure that root causes of issues are solved via software.  If you are a person who takes pride in stability and believes that every operation problem is a software problem, this is a great role for you.


Desired Skills and Experience

  • Design, build, or improve current systems that focus on scalability, availability, and efficiencies of Hulu’s services.
  • Identify mission critical problems and solve them via automation and design improvements.
  • Build or improve monitoring and instrumentation to predict future scalability or latency risks and solve them before they ever manifest into customer facing issues.
  • Develop best practices with development teams to improve scalability and reliability of Hulu’s services.
  • Design and improve the developer platform and infrastructure so that reliability and availability become an even more natural part of our software development process.
  • Experience as a Site Reliability Engineer or a Software Engineer focused on infrastructure and/or operations.
  • Experience with the building blocks of large scale systems including load balancing, fault tolerance, containers, instrumentation, predictive monitoring, etc
  • Familiarity with commonly available services and tools (AWS, Docker, Redis, New Relic, Heroku, Hadoop, etc)
  • Strong passion for automation, testing and code quality.
  • BS in Computer Science or equivalent preferred.
  • Familiarity with one or more of the following: Java, Python, Go.