Site Reliability Engineer, Compute and Storage Group

In this highly visible role, you will have the responsibility of ensuring that Apple’s world class Silicon Engineering Group will have the infrastructure and tools needed to engineer and design the world’s most advanced silicon devices and products. You will utilize your deep understanding of building and maintaining Linux compute clusters, storage systems, web infrastructure & applications, database servers, tool/license management, monitoring systems, work flow optimization, and directory services. You will utilize your extensive communication skills to interface with internal teams, enabling Apple’s world class product development.

Description

You will be responsible for supporting internal engineering teams by enhancing, maintaining, performance tuning, and planning capacity of compute clusters. Your role will directly impact the development, enhancement and maintenance of compute cluster queuing, storage systems, network interconnects, monitoring, LAMP stack, and load balancing needs.

Education Details

MS/BS Degree or equivalent.

Key Qualifications

Typically requires at least 5+ years of experience in Linux or UNIX systems administration in a large engineering or R&D environment and demonstrated skills in the following:

Linux (RHEL/CentOS preferred)

NFS and NAS appliances (NetApp preferred)

Layer 2 / Layer 3 networking (Arista or Cisco preferred)

Scripting in shell, Perl, Python or Ruby

Revision control systems (SVN, git, Perforce)

Centralized configuration management (Puppet, cfengine)

Software/tool compilation and installation

Flexlm and similar licensing systems

Monitoring systems such as Nagios, Zenoss, Groundwork

LDAP (OpenLDAP, DSEE, OpenDirectory)

IPAM with DNS (BIND) and DHCP

Must be analytical and possess strong organizational/problem-solving skills

Desired Skills and Experience

See application page for details