Site Reliability Engineer
Play a part in revolutionizing how people use their computers and mobile devices. Create groundbreaking technology for algorithmic search, machine learning, natural language processing, and artificial intelligence. And work with the teams creating the most scalable big-data systems in existence.Key Qualifications
- Experience managing Linux systems in a 24/7 production environment.
- Ability to program in Python, Ruby or Perl highly preferred.
- Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
- Proficiency with web server administration including Apache and Nginx highly preferred.
- Knowledge of database support and administration including MySQL, Postgres & HBase.
- Experience with monitoring tools such as Nagios, Splunk and munin highly preferred.
- Develop and maintain automation for system administration, provisioning, support and application management related tasks that add value, enable support teams and users, reduce costs and increase business agility.
- Experience with Puppet, Chef or Ansible highly preferred.
- Excellent interpersonal and communication skills demonstrated through previous projects or assignments (work or academic related).
- Cisco and Juniper network administration experience a plus.DescriptionMonitor production, staging, test and development environments for a myriad of applications in an agile and fast paced organization. Must be an independent problem-solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in complex topics, writing status reports, documenting procedures, and interacting with other Apple staff and management. Provide input to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and determine root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the proper execution of the backup systems. Provide 24x7 on-call support to handle urgent escalations. The position will require rotating day, night and weekend shifts.EducationBS in Computer Science or equivalent Program preferred.
Desired Skills and Experience
See application page for details