Apple Media Products (AMP) Site Reliability Engineer (SRE)- Analytics Applications & Big Data
With Apple in London - GBMore jobs from Apple
Posted on April 18, 2020
About this job
Job type: Full-time
Role: System Administrator
Industry: Consumer Electronics
Company size: 10k+ people
Company type: Public
amazon-web-services, sysadmin, python
The Service Reliability Engineer (SRE) role in Apple Media Products (AMP) requires a mix of strategic engineering and design along with hands-on, technical work. If you have experience in being a Systems Administrator and has moved on to DevOps/Automation we should talk. This SRE will configure, tune, and troubleshoot multi-tiered systems to achieve optimal application performance, stability and availability. We work closely with the systems engineers, network engineers, database administrators, monitoring team and information security team. For this position, strict application security and high availability requirements should be balanced to achieve the best solutions. If you are ambitious with a real passion for excellence, quality and detail look no further. If you enjoy closely partnering with the development engineers in addition to working on support operations then this the role for you. Here you will work within the team to aid in architectural design and assist with the implementation of complex features.
- Engage and improve life-cycle of service from inception and design, to deployment, operation, migration and sunsets. - Ensure Service level SLAs are met. - Experience working with different teams to coordinate and execute high level projects. - Write, review and develop code and documentation that solves the hardest problems that live on some of the largest and most complex systems in the world. - Real passion for quality and automation, an ability to understand complex systems and a desire to constantly make things better. - Set priorities and work efficiently in a fast-paced environment - Measure and optimize system performance. - Strong interpersonal skills. - Demonstrate ability to deliver results on time with high quality
Skills & requirements
- 2-5+ years of managing services in a large scale *nix environment.
- Proven understanding of DNS, Load Balancing, TCP/IP, SSL and Linux.
- Proficient in scripting languages like Perl, Python, Shell etc.
- Deep understanding and experience in one or more of the following - Docker, Mesos, AWS, Ansible, Puppet, Chef.
- Deep understanding of J2EE application servers.
- Experience and understanding on Scaling, Capacity Planning and Disaster Recovery is important.
- Should have On-call experience.
- Experience using monitoring solutions like SNMP, Nagios, Zabbix etc.
- Familiarity using Splunk, other log aggregation tools.
- Experience with software, frameworks and APIs.
- Nice to have - Experience handling Big Data Environment like Kafka, Hadoop, Spark, Cassandra, ELK etc.
BS in engineering, computer science or other technical disciplines plus 25+ years of related experience.