Site Reliability Engineering at OmniTI

The OmniTI Ops team is a flexible and progressive group. We work closely with developers, DBAs, and client teams, to help them manage availability and performance in the midst of constant changes. We are not risk averse; instead we strive to understand why things fail and understand the true impact of those failures, so that we can empower others. Collaboration is a cornerstone, and we understand that being friendly and outgoing are keys to making that work.

The role of SRE is a highly technical role, and requires thorough understanding of all components of a modern web application stack, including front-end, networking, and systems level knowledge. In this role you will be working with clients to design, build and operate reliable and scalable services in the cloud, our custom hosting platform, or in their datacenter. You are up to date on current cloud technologies and are equally comfortable on the whiteboard as you are on the command line. You will also help support our internal infrastructure and teams, as well as providing systems consulting, open source product development, and data center infrastructure support for our customers.

Desired Skills and Experience

No one knows it all, but these are the kinds of things we’re looking for:

Experience with cloud and virtualization technologies: AWS, VirtualBox, VMWare, KVM, zones/containers, Vagrant, Docker
Excellent troubleshooting skills with the ability to dive deep into all aspects of the stack to identify and fix problems
Strong background in web server technologies such as Apache, HAProxy, nginx
Familiarity with technologies such as Apache Traffic Server or Varnish, and a good working knowledge of the issues when implementing web caching
Strong knowledge of IP networking protocols
Programming/scripting experience in Ruby, Python, bash, Perl and/or JavaScript
Experience with configuration management tools such as Chef, Puppet, or Ansible
Familiarity with version control systems such as Git/Subversion, from both an end user and administrator perspective
Exposure to dynamic tracing tools such as Dtrace, Ktrace, or SystemTap
In addition to an in depth understanding of Linux, a background in multiple unix based operating systems, including Illumos, Solaris, Linux and OpenBSD is a big plus
You must be willing to share in an on-call rotation and work diligently to eliminate sources of operational disruption. You will need to be able to thrive on the edge of control and yet be a stabilizing force for the challenges of high scale operations and rapid fire context switching. You won’t just be working on our infrastructure, you’ll also be expected to help our clients with broken, under-performing infrastructure, turning it into something that “just works”. It won’t be easy, but you’ll have the ability to push the edges of what technology can do.

If you contribute to an open source project, have a blog, or are involved in technology in some other way, we would love to hear about it when you write to us!