Site Reliability Engineers are directly responsive for the availability of the NetSuite’s customer facing solutions. They monitor the applications, react to problems, proactively address issues before they become problems and build tools to constantly improve availability, performance, uptime and response time. Site Reliability Engineering is a global team ensuring NetSuite exceeds its Service Level Commitment 24x7x365.
Responsibilities:
Keep the customer facing site running
Owner of all alerts and escalations in customer facing production environment
Automate manual tasks
Use SRE toolset to identify, resolve or escalate issues in production
Build effective monitoring that evolves with the product
Work closely with development engineers who build the product
Interface with Customer Support
Build, test and run Disaster Recovery procedures
Gain familiarity with NetSuite solutions and customer needs
Work to constantly increase the number of issues resolved directly by SRE
Minimum Qualifications:
Experience with Unix or Linux
Experience with networking
Database knowledge is desirable
3-4 years experience working in a large scale production operations environment providing mission critical services to customers
Computer Science Degree
Shell scripting
Good troubleshooting skills
Work quickly and accurately under pressure in time critical situations
A self starter who takes pride in job ownership and is always thinking of innovative ways to improve efficiency and effectiveness
Desired Skills and Experience
See application page for details