Desired Skills and Experience
- Continuously refine monitoring processes, thresholds, and configuration
- Work closely with product developers to ensure new features have the proper operational support and maintainability - provide deep technical guidance to development teams
- Help with designing, building and maintaining the cloud native platform needed to support our growth plans, we do that handling Infrastructure as code and automating as much as we can
- Mentoring and supporting team members on production readiness and best practices
- Develop software for the purposes of automating, monitoring and maintaining deployed infrastructure and services
- Handling high-severity internal or customer incidents, ensuring we meet all SLAs
- Help teams create and maintain documentation and runbooks/playbooks
- Participate in Scrum processes and ceremonies
- Respond to issues and escalations
- Participate in on-call rotation
- Track record of leading a team of Software or Systems Engineers
- Track record of working as a Site Reliability Engineer, DevOps Engineer, or a Software Engineer
- Must be able to code and learn coding in new languages
- Experience in at least one scripting language: Python, Ruby, Bash, Perl
- Experience in working with infrastructure as code tools such as Puppet, Chef, SaltStack, Ansible, CloudFormation, Terraform etc.
- Track record of working with Linux systems in production
- Experience in working with container technologies such as Docker
- Experience in working with cloud platforms such as AWS
- Experience using Agile practices
- Experience with modern open source infrastructure services and concepts such as Redis, ElasticSearch, Kafka, and Docker
- Experience in software development in any language. Our focus languages are Go and Scala.
- Experience in working with any functional programming language such as Clojure, Haskell, or OCaml.
Apply