Desired Skills and Experience
- Investigate, triage, and troubleshoot production problems as they occur
- Create and maintain common and integrated standards with respect to logging, latency, troubleshooting, and monitoring
- Develop and maintain tools used in investigating production problems
- Review and influence the design and standards of the software
- Measure current capacity, predict future capacity needs and make suggestions accordingly
- Automate deployment and configuration management, quality (including functional and capacity testing), and reaction to problems
- Facilitate continuous integration/continuous deployment
- 3+ years of experience programming in C/C++
- Demonstrated understanding of how production systems are put together and experience with triaging and solving problems with them
- Strong knowledge of Linux systems
- Familiarity with Python
- Familiarity with configuration management tools such as Chef, Puppet, Ansible or Saltstack
- Practical knowledge of networking such as TCP/UDP/IP
- Familiarity with monitoring tools such as Splunk, ELK, Grafana, Nagios
- Perl, Java or JavaScript experience
- Experience with virtualization technologies such as Vagrant, Terraform, VMWare, KVM
- Knowledge of cloud technologies (OpenStack, AWS, Rackspace, CloudFoundry, OpenShift, WS02)
- Experience with big data technologies such as Hadoop, Spark, Cassandra
- Knowledge of containerization technologies such as Docker, Mesos, Core OS, Kubernetes