Senior Site Reliability Engineer

Senior Site Reliability Engineer San Diego, CA, USA

ServiceNow is changing the way people work. With a service-orientation toward the activities, tasks and processes that make up day-to-day work life, we help the modern enterprise operate faster and be more scalable than ever before. We’re disruptive. We work hard but try not to take ourselves too seriously. We are highly adaptable and constantly evolving. We are passionate about our product, and we live for our customers. We have high expectations and a career at ServiceNow means challenging yourself to always be better. What you get to do in this role: Working closely with systems/network engineers and developers across the company, the role has the following responsibilities:

Operate the Cloud Infrastructure to the highest standards of professionalism and availability
Perform proactive daily system monitoring including responding, triaging, troubleshooting and remediating incidents based on Standard Operating Procedures (SOPs)
Use your broad knowledge of systems administration and networking principles preventing and troubleshooting incidents and improving SOPs
Verify the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs
Verify completion of scheduled jobs such as backups, clones and migrations, and provide clearance to proceed to scheduled changes
Repair and recover from hardware or software failures. Coordinate and communicate with impacted stakeholders and customers, escalating where appropriate
Maintain and implement operational, configuration, and other SOPs
In creating quality of the Cloud service, identify and collaborate with appropriate teams to improve tools and processes such as event monitoring and automationThis is a great opportunity to help us lead the industry in operating the next Cloud Infrastructure. In order to be successful in this role, we need someone who has: The ideal candidate will have a strong background and prior experience in Linux systems administration, troubleshooting, and monitoring of cloud infrastructure. Candidate must have good communication skills and solid troubleshooting techniques. Due to shift schedules, the candidate must be able to work well in a collaborative team environment and individually with less supervision. A Bachelor’s degree in Computer Science or equivalent experience is required.
Excellent knowledge of Linux command line and systems diagnostics towards troubleshooting of incidents and events
Working knowledge of MySQL administration
Experience with systems and network performance and availability monitoring
Familiarity with the following technologies is useful
Oracle, MongoDB, Cassandra or similar technologies
Networking Technologies such as Cisco/Juniper routing, switching and load balancing.
Basic scripting using bash, perl, or python.
ITIL v3. We provide competitive compensation, generous benefits and a professional atmosphere. This is a very collaborative and inclusive work environment where individuals strong on aptitude and attitude will have an opportunity to grow their professional careers through working with some of the most advanced technology and talented developers in the business.

Desired Skills and Experience

See application page for details