Site Reliability Engineer - Lab Engineering
Netflix is almost everywhere. Serving over 80 million customers in 190 countries on devices such as Smart TVs, set-top boxes and software on phones, tablets and computers, Netflix is striving to deliver the vision of #NetflixEverywhere. The Engineers that create this service require a world-class environment and access to thousands of devices in which to build, and perfect, the Netflix experience.
Netflix requires talented and driven individuals that can take our internal development and certification environment to the next level. Our goal is to simplify and automate everything possible within Lab Engineering and change it to “a small matter of code”. Finding and permanently resolving pain points for our partners and customers is our mandate. How we do it, is where the fun lies.
Lab often invokes the thought of white lab coats working in a sterile environment. It is just the term we give to the places where we run tests. Think of Lab as experimenting, trying new and innovative things, and in doing so, having a lot of fun in the process.
Role Responsibilities
- Develop effective tooling, alerts, and response to both identify and address reliability risks
- Participate in on-call rotation with other members of the Lab Engineering team
- Drive issue resolution with partner product engineering and certification teams
-
Evangelize best practices around collaboration, reliability, security and performance to all partner teams Minimum Requirements
- Effective root cause identification, triage and mitigation
- Experience with configuration and troubleshooting of Linux, Java, Tomcat, and other middleware technologies
- Understands mid- to large-scale complex systems from a reliability perspective
- Scripting abilities in Python, Perl, Go, or JVM-based languages
- Strong communication skills and the ability to engage partner teams effectively
- Passion for resolving reliability issues and identify strategies to mitigate going forward We are looking for individuals with skills in one or more of the following groups of Winning Attributes. You may encompass all of one or a few in each, and if you are strong in all, you are The One whose coming has been foretold!
Winning Attributes (Generalist * common to all)
- Automation mindset
- if you can automate it, do it.
- Experience with Cloud Computing platforms (particularly AWS) a plus
- Deep network analysis experience a plus
- Strong Linux and/or Container based system-level analysis capabilities
- Familiarity with devices and software interaction (firmware, custom OS)
- Innovative and creative problem solving skills (some may even say “out there”)
-
Experience with mechanical, electrical or robotic systems a plus Winning Attributes (Development)
- Mid to large scale project development skills in C, C++, and/or Java (embedded/proprietary environments a huge plus)
- Understanding of development and deployment processes
- Effective debugging methodologies in a mixed device environment
- Ability to emit, collect, analyze and act upon metrics data
-
Web UI development for tooling a plus Winning Attributes (Networking)
- Experience with Cisco, Arista, Aruba, Quanta and/or other networking families
- Configuration, monitoring and debugging of networking issues
- Programming against exposed networking APIs for the above vendors
Desired Skills and Experience
See application page for details