Sr. Site Reliability Engineer, Cloud Test Infrastructure
-
Plan, execute, and manage strategy for test cluster operations, and improve cost efficiency as well as reliability.- Build new tools and scripts to improve application performance, monitoring, and recovery.- Help oversee and automate cloud infrastructure, systems scalability, and systems/network security.Key Qualifications
- 5+ years in data center engineering and operations, ideally in a team lead role.
- Working knowledge and practical experience in increasing technology footprints, ensuring uptime, and working with network, security, and systems engineers to produce quality solutions.
- Minimum 3 years of experience with scripting (Python, shell, Ruby)
- Experience with open-source continuous integration and configuration management tools
- Jenkins, Chef, Puppet
- Experience with Log Management tools such as ELK Stack, Splunk, MongoDB, etc.
- Exposure to HVAC management, power supply, UPS, cooling/heating.
- Experience with monitoring tools like Nagios and GangliaDescriptionApple’s Software Automation Platform team provides cloud-based testing services for all software contributions to iOS, OS X, tvOS, watchOS. The team operates a large-scale onsite test cluster that is poised to scale for supporting the ever-growing business needs. The Sr. Site Reliability Engineer will be responsible for operating, managing, and scaling a sophisticated test cluster environment. This is a unique opportunity to join an early stage and high-impact team.EducationBA/BS degree in computer science or equivalent field with 5+ years of professional experienc
Desired Skills and Experience
See application page for details