Engineer - Site Reliability
SUMMARY: This position will focus on monitoring, deployment stability, and system reliability. You will utilize your strong operations background and scripting skills to instrument our monitoring platforms to detect abnormalities before they become a problem. DUTIES:
- Develop monitoring and notification policies for production applications.
- Own operability, scalability and performance processes.
- Ongoing maintenance of performance thresholds to reduce noise and increase reliability, responsiveness, and accuracy.
- Documentation of alerts and definition of procedures to resolution.
- Facilitate post incident reviews to identify areas for improvement.
- Work with deployment team to ensure parity between environments as well as a stable build and release cycle.
- Help oversee and maintain Jenkins instances and automated/scheduled jobs.
-
Work with engineering teams to understand system architecture, identify single points of failure, and design a reliable production environment. EDUCATION/EXPERIENCE/LICENSURE:
- B.S. in Computer Science or related field preferred
- 5 years of overall related experience
- Experience in Linux systems administration, Ubuntu preferred.
- Experience setting up and modifying system monitoring (Nagios, New Relic, and CloudWatch preferred).
-
Scripting language experience (Bash, Perl, or Python preferred). KNOWLEDGE, SKILLS AND ABILITIES:
- Knowledge of Redis, MYSQL, Cassandra, HAProxy, ELK, Play, Tomcat, SOLR, ElasticSearch, or JVM tuning is a plus.
- Provisioning automation experience is a plus.
- Comfortable excelling in a frequent and incremental code testing and deployment environment.
- Comfort with collaboration, open communication and reaching across functional and organization borders.
- Desire to learn and apply new technologies is required.
-
Ability to work effectively within a team environment is required. ADDITIONAL INFORMATION:
- This position will require participation in a 24/7 on call rotation, serving on an escalation policy.
- Occasionally, this position will require more than 40 hours of work per week including evenings, nights, and weekends as needed.
Desired Skills and Experience
See application page for details