AI/ML - Site Reliability Manager, Siri Knowledge Platforms
With Apple in London - GBMore jobs from Apple
Posted on April 18, 2021
About this job
Job type: Full-time
Role: System Administrator
Industry: Consumer Electronics
Company size: 10k+ people
Company type: Public
Play a meaningful role in revolutionizing how people use their computers and mobile devices, build ground breaking technology for algorithmic search, machine learning, natural language processing & artificial intelligence and work with the teams building the most scalable big-data systems in existence.
As part of this team, you will be the point person for hiring several people to grow into the team, and be the front-line manager for the AI/ML SRE function in London. As a working manager, you will monitor production and staging environments for a myriad of applications in an agile and dynamic organization. While striving to improve the stability, security, efficiency and scalability of all production systems, strong troubleshooting ability will be used daily; a successful engineer will attempt to isolate issues and resolve the root cause through investigative analysis. The role also requires building and maintaining accurate, up-to-date documentation reflecting configuration, providing code reviews, training and mentoring staff, as well as writing status reports and interacting with other Apple employees and management. The ideal candidate is an independent problem-solver who is focused and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner.
Skills & requirements
- Experience building and managing small, highly agile teams.
- Actively participate in the day-to-day stability of a 24/7 global service.
- Sophisticated knowledge of Kubernetes, containerization systems, and public cloud infrastructure.
- Proficiency programming in Go, Python, or similar language to automate tasks.
- Experience with monitoring tools such as Prometheus.
Bachelor’s degree in engineering, computer science or related field, or equivalent work experience.
- Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
- Demonstrated history of developing and maintaining automation for infrastructure and application management with configuration tools such as Puppet, Ansible, or Chef.
- Proficiency with web server administration including Apache and Nginx.
- Knowledge of database design, support and administration including Postgres, MySQL, and HBase.
- Network administration and troubleshooting.
- Good interpersonal skills shown through previous projects or assignments.