Desired Skills and Experience

  • Investigate, triage, and troubleshoot production problems as they occur
  • Create and maintain common and integrated standards with respect to logging, latency, troubleshooting, and monitoring
  • Develop and maintain tools used in investigating production problems
  • Review and influence the design and standards of the software
  • Measure current capacity, predict future capacity needs and make suggestions accordingly
  • Automate deployment and configuration management, quality (including functional and capacity testing), and reaction to problems
  • Facilitate continuous integration/continuous deployment
  • 3+ years of experience programming in C/C++
  • Demonstrated understanding of how production systems are put together and experience with triaging and solving problems with them
  • Strong knowledge of Linux systems
  • Familiarity with Python
  • Familiarity with configuration management tools such as Chef, Puppet, Ansible or Saltstack
  • Practical knowledge of networking such as TCP/UDP/IP
  • Familiarity with monitoring tools such as Splunk, ELK, Grafana, Nagios
  • Perl, Java or JavaScript experience
  • Experience with virtualization technologies such as Vagrant, Terraform, VMWare, KVM
  • Knowledge of cloud technologies (OpenStack, AWS, Rackspace, CloudFoundry, OpenShift, WS02)
  • Experience with big data technologies such as Hadoop, Spark, Cassandra
  • Knowledge of containerization technologies such as Docker, Mesos, Core OS, Kubernetes