We are seeking a Site Reliability Engineer who’s eager to work in an environment that is fast paced, complex, and extremely large. This engineer will need to be a team player and work effectively with other members of SRE as well as other groups within Apple.

Description

Own the application and all aspects of it in production including the user experience

Work closely with developers in supporting new features, services, releases, and become an expert in our services

Monitor site reliability and performance

Scale infrastructure to meet demand

Troubleshoot site down issues

Continuously monitor/improve the quality of our infrastructure

Develop automation tools

Document system design and procedures

Participate in on-call rotation

Education Details

Bachelor’s degree in Computer Science or relevant industry experience

Key Qualifications

Minimum of 3 years of experience supporting internet-facing production services and distributed systems

Extremely organized, detail oriented, and thorough in every undertaking

Able to balance multiple tasks and projects effectively and quickly adapt to new variables

Demonstrated problem solving ability utilizing creative and innovating thinking but also adhering to

Self motivated and eager to learn

Professional and open minded attitude

Able to work closely with other team members as well as work independently

Additional Requirements

Solid understanding of: Common internet protocols (tcp, udp, ssh, http(s), etc), Configuration management tools, Supporting tiered applications and related concepts such as load balancing, Monitoring services in production, CDNs

Demonstrated proficiency with: Linux systems and associated tools/technologies, Troubleshooting networked services, At least 2 scripting languages (bash, ruby, perl, etc), Execution of changes in production

Pluses: Direct experience with hardware load balancers, Experience at a large scale internet presence, Experience in code deployments

Desired Skills and Experience

See application page for details