Senior Operations Engineer - Leading SaaS product at FreeAgent (Edinburgh, UK) (allows remote)

Desired Skills and Experience

Support the smooth running and uptime of our external and internal production systems
Support the path-to-production for our frequent changes to application code and data
Maintaining our disaster-recovery and data backup processes
Conduct peer reviews of infrastructure and configuration changes
Be actively involved in the continuous evolution of our systems and infrastructure, from small tweaks to epic changes
Work alongside the wider engineering team planning and developing new features
Participate in our 24/7 emergency on-call rota
Ensure no single points of failure are introduced so out-of-hours calls stay rare
Management of virtualised Unix / Linux servers. We’ve been using containers in production for years on our own hardware, running SmartOS.
Configuration management technologies - we don’t configure servers by hand, instead we use puppet
Production problem solving and performance optimisation - things break or slow down and it’s good to find out why. We accept that nothing can be perfect and value the time spent digging deep to really try and understand issues
Hands on low-level networking - we run our own servers and network gear in multiple data centres and use dynamic routing protocols to ship traffic between logically isolated networks of virtual machines
Good understanding of common network protocols - and someone who can find their way around an RFC
Good communicator - we’re all constantly learning and like to encourage the sharing of knowledge across our engineering team
Security conscious - you understand the importance of security best practices, know your BEAST from your HEARTBLEED and know how to establish a robust set of defences
Nearly all of our code is written in Ruby and all of our code is checked into git
Ideally some production experience managing relational databases. We run MySQL and have databases with multi-million rows, perform routine online schema changes and periodic DR tests and rely on master-master replication to keep our site online throughout
We use RabbitMQ behind the scenes, having used this before would be a definite plus
We run ElasticSearch for in-app user-searching and also to store many terabytes of log data
33 days annual leave, including public holidays, increasing year on year
Family friendly policies
Childcare vouchers
Professional development and training
Contributory Pension
Private Health Insurance
Group Life Assurance
Income Protection
Cycle to Work scheme