Desired Skills and Experience

  • Managing, monitoring, extending our mission critical web application infrastructure across multiple datacenters in a no downtime environment.
  • Being part of an on-call rotation. When alarms ring we respond immediately and effectively.
  • Learning from alarms. After an event, we learn from it, improving our infrastructure, monitoring, or applications to keep the same issue from ringing alarms again.
  • Working with application developers and product “customers” to deploy and monitor new services repeatably, at scale, in cloud services.
  • Building visualizations of our data, to understand the performance of our systems and recommend improvements to remove bottlenecks and points of failure.
  • Discovering, testing, deploying upgrades to or replacement of components of our infrastructure, from simple OS packages, to database version upgrades, to wholesale replacements. (Maybe it’s time to move from Apache to nginx? Choose between monit and upstart?)
  • Our stack: Ruby on Rails, PostgreSQL, Node.js, Redis, Graphite
  • Hosting: Rackspace and AWS clouds.
  • Configuration Management in Chef
  • Monitoring: Statsd, Graphite, New Relic
  • Very experienced with Linux system administration. We run Ubuntu.
  • Substantial experience with a programming language like Ruby or Python.
  • Experience developing and monitoring mission critical web applications.
  • Knowledge of and a passion for repeatable configuration management using tools such as chef, puppet, or ansible. We use Chef.
  • Bonus: experience configuring and monitoring CDNs.  We use Fastly.
  • Bonus: experience tuning and administering PostgreSQL databases.