As a Site Reliability Engineer in Xactly’s San Jose office, you will be filling a mission-critical role ensuring that our complex systems are healthy, monitored, automated, and designed to scale. You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues. The ideal candidate is passionate about an operations role that involves deep knowledge of both the application and the product. They will also believe automation is a key component to operating large-scale systems.
Desired Skills and Experience
- Serve as the primary contact responsible for the overall health, performance, and capacity of one or more of our Internet-facing services
- Obtain complete knowledge of our complex applications
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth
- Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment.
- Work closely with development teams to ensure platforms are designed with “operability” in mind
- Function well in a fast-paced, rapidly-changing environment
- Participate in a 24x7 on-call rotation for second-tier escalations Required Skills:
- UNIX/Linux systems knowledge/administration background
- Trouble-shooting skills that span systems, network, storage, and code
- Basic shell scripting skills Preferred Qualifications:
- 5+ years in a UNIX-based large-scale web operations role
- Programming skills (Python, Perl, Ruby, etc)
- Experience with web-based Java/J2EE architectures and JVM configuration
- Understanding of configuration management tools such as Chef, Puppet, etc.
- Knowledge in majority of the following: data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, and other related topics
- Production experience with Map-reduce, Hadoop, and Hive, JBoss, Tomcat
- Knowledge of load balancers such as HAProxy, Cisco ACE, etc.
- Strong interpersonal communication skills (including listening, speaking, and writing)
- Ability to work well in a diverse, collaborative, and cross-functional team environment (SREs, Engineers, Product Managers, etc.) Benefits and Perks:
- Flexible time off (vacation, sick, volunteer = your choice!)
- Corporate discounts
- Generous insurance policies (pets included!)
- Tuition reimbursement
- Money for fitness programs
- End of month surprises, contests, BBQs, parties & reward vacations
- Kitchen stocked with daily tasty snacks and drinks
- Free parking & commuter benefits
- 401k & employee match