Lead SRE (Site Reliability) at Cake Solutions - A BAMTECH Media Company (London, UK)

Desired Skills and Experience

Continuously refine monitoring processes, thresholds, and configuration
Work closely with product developers to ensure new features have the proper operational support and maintainability - provide deep technical guidance to development teams
Help with designing, building and maintaining the cloud native platform needed to support our growth plans, we do that handling Infrastructure as code and automating as much as we can
Mentoring and supporting team members on production readiness and best practices
Develop software for the purposes of automating, monitoring and maintaining deployed infrastructure and services
Handling high-severity internal or customer incidents, ensuring we meet all SLAs
Help teams create and maintain documentation and runbooks/playbooks
Participate in Scrum processes and ceremonies
Respond to issues and escalations
Participate in on-call rotation
Track record of leading a team of Software or Systems Engineers
Track record of working as a Site Reliability Engineer, DevOps Engineer, or a Software Engineer
Must be able to code and learn coding in new languages
Experience in at least one scripting language: Python, Ruby, Bash, Perl
Experience in working with infrastructure as code tools such as Puppet, Chef, SaltStack, Ansible, CloudFormation, Terraform etc.
Track record of working with Linux systems in production
Experience in working with container technologies such as Docker
Experience in working with cloud platforms such as AWS
Experience using Agile practices
Experience with modern open source infrastructure services and concepts such as Redis, ElasticSearch, Kafka, and Docker
Experience in software development in any language. Our focus languages are Go and Scala.
Experience in working with any functional programming language such as Clojure, Haskell, or OCaml.

Apply