Site Reliability Engineer (SRE) - Cloud Technologies

With Apple in Austin TX US

More jobs from Apple

Posted on November 27, 2019

About this job

Job type: Full-time
Role: DevOps, System Administrator

Technologies

devops, scrum, agile

Job description

This position can be located in Santa Clara Valley (CA) or Austin (TX) This role will be responsible for designing, building, running, and monitoring public & private cloud infrastructure to support a variety of mission critical services. This is a highly technical, hands-on role that requires expertise supporting systems at enterprise scale. The candidate will deliver innovative solutions in key areas: Engineering - Continuously optimize secure, scalable and performant security tools and services Reliability - Drive fault detection and correction, performance and uptime at global scale Monitoring - Instrument systems to gain visibility and understanding of how they are performing at any time Automation and orchestration to enable - Accelerated infrastructure, application and software configuration deployment - Automated response to alerts or indicators of performance issues - Infrastructure as code

  • Build, engineer and support cloud platform IaaS and PaaS services - Partner with application teams to provision scalable workloads reliably across distributed compute resources - Provide engineering and operational support for distributed systems and network based information security tools, including for configuration management and provisioning - Implement and maintain security controls - Work closely with development teams to understand application performance and behavior patterns to proactively monitor, tune and correct issues before they occur - Identify opportunities to improve security tooling reliability, performance and security - Develop tools and automation to eliminate manual and repetitive efforts

Skills & requirements

  • 5+ years of managing services in a distributed, mission critical *nix environment
  • Experience supporting infrastructure and services in public and private cloud environments
  • Expertise with monitoring or log aggregation tools (Prometheus, Splunk, ELK, etc.)
  • Experience building and supporting containerized application technologies including Docker, Kubernetes
  • Familiarity with CI/CD tools and deployment processes
  • Working knowledge of network protocols and network based services, including routing and network load balancing
  • Failure Testing and Chaos Engineering
  • Experience with virtualization technologies
  • Solid understanding Linux/Unix system internals, including kernel tuning
  • Solid understanding of storage systems, including network filesystems
  • Proficient with various programming languages such as Python/Java/Ruby/Perl/Go/Makefile for building automation or integration with APIs
  • Solid understanding and experience with centralized configuration management, coordination and provisioning technologies, such as Ansible, Chef, Puppet, etc.
  • Excellent communication skills, must be capable of working with cross functional technical and business teams and varying levels of management
  • Experience implementing and working with open source projects
  • Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
  • Strong project management skills, including excellent presentation skills
  • Must be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
  • Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment

Bachelor of Science in Computer Science or equivalent experience 4+ years

Apply here