Background 

Our SRE team proactively ensures the stability, resilience and scale of our services by automation, testing and engineering. We build on expertise from systems / operations (OS & DB), cloud infrastructure (AWS), pipeline / release engineering (TeamCity), software development and stress / load testing to make sure our services are available 24 hours a day, seven days a week.

We’re looking for engineers to join the team with a passion for infrastructure and delivery who are equally happy:

The ideal candidate will strive for continual improvement by contributing and assessing new ideas and innovations to meet short term and longer term goals whilst at the same time accepting responsibility for day-to-day health of our environments.

Responsibilities

You will work in our SRE team, or embedded in our engineering teams, to deliver our SRE mission:

Desired Skills and Experience

  • working with developers to ensure a principled approach to delivering change in a safe and secure way
  • working with third parties to ensure our comms are reliable
  • working with other SREs to hit our service level objectives and prove our systems and environments
  • Change management and delivery pipeline into production
  • Ensure safety, predictability, repeatability and auditability of all build and deploy processes
  • Enabling ownership by platform and application engineers of tech-specific build plans
  • Enabling maximum velocity without violating service level objectives
  • Monitoring, alerting, SLO tracking
  • To proactively manage delivery of service level objectives
  • Detection / early warning / self-heal
  • On-call management
  • Facilitate emergency / incident response
  • Create, maintain and test for recovery (backup & restore, infra automation etc.)
  • Provisioning / automating deployment infrastructure
  • Demand forecasting and capacity management
  • Efficiency and cost management
  • Performance and scalability of the services
  • Ownership of some cross-cutting implementation like logs / metrics infrastructure
  • Automation of security checks, break-glass procedures, etc.
  • Provide level of audit and control to security personnel
  • Software development experience: ideally Java / JVM but not essentially; javascript, python, bash all beneficial
  • AWS expertise; familiarity with core services (S3, EC2, ELB, ASG) and CloudFormation
  • Good understanding of traditional ops areas of expertise: Linux, Disk I/O, Networking, VPNs
  • Good familiarity with docker and container ecosystem
  • Continuous delivery - principles and pragmatics of dealing with build pipelines, artefact repositories, zero-downtime deployment and so on
  • Proving resilience via failure injection (chaos monkey), scalability via load and stress testing
  • Experience with any of the following: CoreOS, ELK, Prometheus, ElasticSearch, PostgreSQL, PagerDuty, Gatling, JMeter, Kubernetes
  • Some understanding of iOS or Android also beneficial
  • Sensitivity to (but also boldness to influence) culture and behaviour across an organisation
  • We raised US$70m in 2016 to see us through to launch and beyond
  • The Starling team are a mix of technologists, entrepreneurs, designers, brand and customer experts, biz ops managers, and strategists, all working together to deliver our vision
  • There’s currently between 60-70 people working in the office on any given day
  • There isn’t an IT/Engineering department – talented engineers are just a core part of the team
  • Doing the right thing for customers trumps all, and as such we take our regulatory, conduct and ethical responsibilities very seriously
  • Passion for what we do
  • Belief in how we’re different
  • A can-do attitude, ready to tackle challenges laterally
  • The ability to communicate our vision internally and externally
  • Innovative thinking