Site Reliability Engineer
My client is looking for a Sr. Site Reliability Engineer to build DevOps tools, maintain and build out DevOps environment and handle the Continous Integration for the team. Work as part of a team, designing and implementing tools to enforce reliability standards, data reconciliation procedures and automated monitoring and alerting. * For immediate consideration please send qualified resumes to Candice Branyon at Cbranyon@judge.com Location: SF, CADuration: 6-12 months with possible extension please note no third parties for this role, w2 onlyThis job will have the following responsibilities: *Scope, design and implement scripts or tools to help reach our systems reliability goals for all business critical system. *Systems will produce operational logs which comply to logging standards as set forth in enterprise NFRs(might need to create). *Logs will be transmitted to and stored in a log aggregation system {LogStash} for event monitoring, reporting, and archival. Logs on the original server will be compressed and archived according to the systems NFRs (which may need to be written). *Up/Down Alerting * A standard set of systems level monitors will be attached to each system (CPU, Memory, Storage and IO). *Threshold Alerting * Baselines will be established for the standard monitors and alerts will be triggered for percent deviations from the baseline. Alerts will be sent to an alert aggregator and where a framework for consolidation and escalation has been established {Pagerduty/MIR3} through calls to the CMBD {ServiceNow} . *All severe events will automatically create a ticket in the incident management system {Service Now} for tracking and RCA. *All Monitors will have a heartbeat attached and alerts will be sent should the monitor itself stop behaving {AppD Agent; cron scripts} • Write clear, maintainable, portable, and highly functional code. • Profile and performance tune code to remove bottlenecks. • Test and document all code produced. • Mentor and guide less experienced programmers as needed. Qualifications & Requirements: • Minimum seven years professional programming experience. • Strong programming generalist with solid code architecture skills. • Ability to self-govern and prioritize • Enthusiasm and initiative. • Excellent spoken and written communication. • Commitment to code quality, documentation, and sound testing procedures. • Expert Python or scripting equivalent • Knowledgeable with Linux and Window environments • Experience with ServiceNow APIs (Orchestrator) • Experience with Elastic Search/Logstash/Kibana • Experience with monitoring tools; Nagios, Nimsoft, AppDynamics, Zenoss * For immediate consideration please send qualified resumes to Candice Branyon at Cbranyon@judge.com
Desired Skills and Experience
See application page for details