Site Reliability Engineer
With Domino's Pizza in Ann Arbor MI US
More jobs from Domino's PizzaPosted on January 13, 2021
About this job
Compensation: $85k - 100k
Location options: Paid relocation
Job type: Full-time
Experience level: Mid-Level
Role: DevOps, System Administrator
Industry: eCommerce, Information Technology, Software Development / Engineering
Company size: 1k–5k people
Company type: Public
Technologies
linux, kubernetes
Job description
The Platform Engineer – Site Reliability Engineering (SRE) is responsible for the overall maintenance and provisioning of the RedHat Linux environment within eCommerce at Domino’s, both VMWare Guest and Kubernetes platforms. This position requires a wide base of knowledge from basic Linux administration through capacity planning.
Duties and Responsibilities:
- Perform regular operating system patching, rebooting, and remediation of identified security vulnerabilities
- Participate in regular security analysis and operating system hardening requirement discussions
- Ensure platform consistency is achieved between each stack and environment, prior to each release cycle
- Ensure base server platforms are upgraded to N, or N-1 where required by the business on a quarterly basis
- Ensure services are upgraded to N, or N-1 where required by the business on a quarterly basis
- Perform service benchmarking to determine the impact of application of upgrades, tuning parameters, or business requirements
- Provide capacity planning and trending analysis with regards to system and service performance over time
- Ensure a standard platform is available, current, and extensible for both eCommerce and Corp environments
- Ensure server provisioning practices and documentation are current and maintained
- Participate in automation activities related to their functions, managing content in revision control
Qualifications:
- Bachelor’s degree in computer science or equivalent experience
- 5+ years production application support experience in a high uptime environment
- 5+ years UNIX administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
- 5+ years hosting experience in a large heavy-traffic environment
- Excellent troubleshooting and analytic skills
- Extensive knowledge in platform management in VMWare and Kubernetes
- Ability to manage and execute scripting such as bash and python
- Ability to manage content in BitBucket
- Prefer experience with middleware tools such as ActiveMQ, RadiantLogic and PingFederate
- Ability to create systematic and manual operations procedures in both technical and user-friendly language.
- Familiarity with process and efficiency enhancements.