Site Reliability Engineer - Systems
Indeed is looking for an experienced Site Reliability Engineer to join our Production Operations team. Our team helps people get jobs by engineering resilient systems that are fast, fault-tolerant and scalable. These systems provide job search capabilities to over 200,000,000 job seekers who use Indeed every month.
Responsiblities
The Production Operations team is responsible for resiliency, performance and security of Indeed’s global production infrastructure
- We provide guidance to the Software Engineering teams and drive best practices for Indeed’s products, which span multiple data centers across five continents.
- We are experts in core infrastructure technologies like load balancers, HTTPd, Puppet, Tomcat, Memcached, RabbitMQ, Elasticsearch, MongoDB, and more.
- We realize that failure is inevitable, so we embrace it and plan for fast recovery, in order to deliver near 100% uptime. As a Site Reliability Engineer, you’re curious, with deep technical knowledge. You’re a tinkerer and an engineer who uses ingenuity to solve hard problems. You foster a culture of inquisitiveness, collaboration and learning and are able to empathize with others. Your adaptation and evolution are guided by your experiences. You don’t need to be the smartest person in the room, because you know that every interaction is a learning opportunity.
We’re looking for an influential decision-maker who’s ready to take on a high level of ownership and responsibility. A forecaster and problem solver for all of Production Operations and Software Engineering.
Responsibilities
- Develop training and mentor teammates
- Define standards for configuration, monitoring, reliability and performance
- Serve as subject matter expert for multiple proprietary and open source technologies
- Design and implement innovations that improve software engineering velocity, infrastructure resiliency, security, and data availability
- Coordinate and perform major upgrades with zero downtime
- Provide expert perspective regarding the capabilities and limits of the multi-datacenter production infrastructure in software architecture designs
- Influence Software Engineering leadership by motivating improvements to Indeed’s software systems and education
- Solve live performance and stability issues and prevent their recurrence
- Work with highly skilled subject matter experts in a follow-the-sun on-call rotation
Requirements
- Advanced knowledge of Unix/Linux systems. You know how page cache works and feel very comfortable at the command line.
- Ability to write code. You use automation to make your job more efficient.
- Experience with configuration management. You have managed an infrastructure with hundreds or thousands of servers and dozens of technologies.
- Strong networking fundamentals. You understand TCP/IP, subnetting and the difference between socket and connect timeouts.
- In-depth understanding of web operations best practices. You’ve operated websites at scale for years.
- Knowledge of distributed systems. You consider data consistency and availability tradeoffs when designing systems.
- A knack for troubleshooting tough problems. Your high level of ownership and curiosity empower this skill.
- Ability to learn rapidly. You will quickly comprehend our code, open source code, and how it all fits together.
- Meticulous and cautious. You identify and consider all risks, and balance those with performing the task efficiently.
- Comfortable working in a highly collaborative environment. You are very receptive to giving, receiving and implementing feedback.
- Solid communicator with great customer service skills.
- Extremely curious about how things work.
Desired Skills and Experience
See application page for details