Lead Site Reliability Engineer

About The Job

The SRE Team lead doesn’t just fix websites, you will oversee all systems operations initiatives, including designing company-wide systems layout and network architecture and building out a multi-datacenter managed hosting platform. On top of that, you’ll lead your team into the field and help our clients take their broken, failing infrastructures and turn them into something that “just works”. You will have direct input into the business you work on, and you’ll be responsible for mentoring the people working on your team.

About You

Strong understanding of web application architecture fundamentals, including TCP/IP, HTTP, and caching strategies at all layers Experience with internal core systems, such as, but not limited to, DNS, LDAP, NTP Familiarity with Amazon, Joyent or similar cloud infrastructure Ability to translate technical needs into business plans, and vice versa Familiar with all aspects of system administration on Illumos/Solaris and Linux Excellent communication skills, both written and verbal Ability to remain comfortable and calm in the midst of chaos

Bonus

Experience in “continuous deployment” environments Experience working on multiple layers of the stack (OS, DB, Programming, etc…) A history of working and sharing with external / OSS communities Hands on experience with system automation tools like Chef or Ansible

We work on systems that are both very large, and some that are small, but they are all mission critical to our clients, and they include a wide array of technologies. You should be comfortable getting very hands on in helping make things go.

Note: This position is located in Fulton, Maryland. Remote work is not an option, but relocation is available.

Desired Skills and Experience

See application page for details