What are we building?

What will you be working on?

Cloud scale, distributed systems. You will make sure that our automation is running and our services are never down. You will shorten our deploy time. You will function in a highly agile and quick moving environment with a complex and mixed AWS and data center infrastructure.

Who are you?

A Senior Site Reliability Engineer with a passion for cloud-scale, micro-service architecture to take us to a billion minutes per month and beyond.

What will your responsibilities be?

Desired Skills and Experience

  • Providing cloud-scale APIs for speech recognition, natural language processing, and predictive analytics
  • Building highly scalable, distributed microservices that run across 10,000 CPU cores in an architecture designed for 1 billion minutes per month
  • Implementing a seamless developer experience and showcase UX for voice/speech analytics applications
  • Design, test and implement solutions to the hardest Ops problems in a mixed public/private cloud and data center environment
  • Drive the next level of automation for scalability, reliability and uptime
  • Champion the best security practices for our systems
  • Work with Ops team and Dev team to improve quality, uptime and monitoring
  • Continuously improve Ops tools and processes
  • At least 7 years experience and a track record of IT Ops and cloud maintenance
  • Strong proficiency in Python, Ruby, Bash, Perl scripting
  • Demonstrated ability to provide technical ownership in a dynamic, fast-paced environment
  • Hands-on experience with Chef, Puppet, Saltstack, Ansible or other configuration management system. We are a Chef shop
  • Prior experience with Linux, networking and routing, IT security, Monitoring and metrics, MongoDB, ElasticSearch, ELK, Zookeeper preferred
  • Working knowledge of PCI, SOC, ISO, and HIPAA certifications will be beneficial