Site Reliability Engineer
We want to bridge the gap between web developers and traditional tech ops by building processes that support our infrastructure (both bare metal private and public clouds) including management, monitoring, deployment. This role will focus will be on allowing for scale to keep up with customer demand while providing the best customer experience possible. This role will work side by side with dev teams. We are looking for someone that can work in the scrum, bridge the gap between dev and ops, and help define tools, systems and processes. This role will help provide a tech ops perspective on story pick-ups and help facilitate tasks needed to keep the runway clear for new code. Daily tasks could include dev stand ups, NOC information hand-offs for code coming down the pipe, building puppet manifests, verifying continual integrations and deployments are functioning as needed, and winning the ping pong championship! Requirements:
- Proficient in high level script languages (Python preferred) as well as script environments like bash
- Linux(Centos, Ubuntu)
- Puppet Experience (both master based and headless)
- Building/implementing monitoring for network, server and application status
- Experience with monitoring tools such as nagios, zabbix, and cacti
- Experience with hardware and software firewalls, IPS, WAF, and additional security layers (LDAP, SSO, 2Factor)
- Continuous integration, testing, and deployment
- Experience with both RDBMS (MySQL) and NoSQL (Cassandra, Couchbase, Mongo)
- A desire to automate everything!
- You should dream in technical documentation terms and want to share that knowledge with the magical wiki
- Fear of black boxes
- you should be interested in finding out how everything (including custom internal coded systems) work
- Bonus points provided if you get a nervous tick when you find a single point of failure or manual failover system.
Desired Skills and Experience
See application page for details