Desired Skills and Experience

  • Support the smooth running and uptime of our external and internal production systems
  • Support the path-to-production for our frequent changes to application code and data
  • Maintaining our disaster-recovery and data backup processes
  • Conduct peer reviews of infrastructure and configuration changes
  • Be actively involved in the continuous evolution of our systems and infrastructure, from small tweaks to epic changes
  • Work alongside the wider engineering team planning and developing new features
  • Participate in our 24/7 emergency on-call rota
  • Ensure no single points of failure are introduced so out-of-hours calls stay rare
  • Management of virtualised Unix / Linux servers. We’ve been using containers in production for years on our own hardware, running SmartOS.
  • Configuration management technologies - we don’t configure servers by hand, instead we use puppet
  • Production problem solving and performance optimisation - things break or slow down and it’s good to find out why. We accept that nothing can be perfect and value the time spent digging deep to really try and understand issues
  • Hands on low-level networking - we run our own servers and network gear in multiple data centres and use dynamic routing protocols to ship traffic between logically isolated networks of virtual machines
  • Good understanding of common network protocols - and someone who can find their way around an RFC
  • Good communicator - we’re all constantly learning and like to encourage the sharing of knowledge across our engineering team
  • Security conscious - you understand the importance of security best practices, know your BEAST from your HEARTBLEED and know how to establish a robust set of defences
  • Nearly all of our code is written in Ruby and all of our code is checked into git
  • Ideally some production experience managing relational databases. We run MySQL and have databases with multi-million rows, perform routine online schema changes and periodic DR tests and rely on master-master replication to keep our site online throughout
  • We use RabbitMQ behind the scenes, having used this before would be a definite plus
  • We run ElasticSearch for in-app user-searching and also to store many terabytes of log data
  • 33 days annual leave, including public holidays, increasing year on year
  • Family friendly policies
  • Childcare vouchers
  • Professional development and training
  • Contributory Pension
  • Private Health Insurance
  • Group Life Assurance
  • Income Protection
  • Cycle to Work scheme