Site Reliability Engineer

With Apple in Cupertino CA US

More jobs from Apple

Posted on April 07, 2021

About this job

Job type: Full-time
Role: System Administrator
Industry: Consumer Electronics
Company size: 10k+ people
Company type: Public


cassandra, java, amazon-web-services

Job description

Appleā€™s Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, we are pushing the envelope! Working with multiple lines of business, we handle many streams of Apple-scale data. We bring it all together and extract the value. We do all this with an outstanding group of software engineers, data scientists, SRE/DevOps engineers and managers.

Monitor production, staging, test and development environments for a myriad of applications in an agile and multifaceted organization. You are an independent problem-solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous contending priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in sophisticated topics, writing status reports, documenting procedures, and interacting with Apple staff and management. Provide guidance to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and resolve root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the accurate execution of the backup systems. Provide 24x7 on-call support to handle urgent critical issues.

Skills & requirements

  • Experience in managing large scale Cassandra, Elastic Search clusters
  • Experience in improving the Application Security & Resolving Security Vulnerabilities
  • Expertise in configuration management (such as Ansible, salt) for deploying, configuring, and managing servers and systems
  • Have a passion for automation by creating tools using Python, Java or other JVM languages
  • Experience deploying and managing CI/CD pipelines
  • Experience managing infrastructure in AWS
  • Strong experience in handling distributed computing systems, e.g., NoSQL, Cassandra, Hadoop
  • Strong expertise in troubleshooting sophisticated production issues
  • Experience in handling data ingestion pipelines for large big data infrastructure
  • Expert understanding of Unix/Linux based operating system
  • Excellent problem solving, critical thinking, and interpersonal skills
  • The candidate should be adapt at prioritizing multiple issues in a high pressure environment
  • Should be able to understand sophisticated architectures and be comfortable working with cross-functional teams
  • Ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Should be highly proactive with ad focus on improving uptime availability of our mission-critical services
  • Comfortable working in a fast paced environment while continuously evaluating emerging technologies
  • The position requires solid knowledge of secure coding practices and experience with the open source technologies

BS in computer science with 710 years or MS plus 57 years experience or related experience.

Apply here