Desired Skills and Experience

  • ~100 servers, mostly hosted on AWS
  • 8 AWS regions, as well as multiple colocated hosting providers
  • Hundreds of public IP addresses
  • 300+ HTTPS requests per second
  • 25+ FTP/SFTP/FTPS logins per second
  • 100+ file transfers per second
  • 4,000 log entries per second
  • 150,000+ metrics
  • 99.9% uptime record
  • Significant experience working with GNU/Linux servers, including a complete understanding of the command line, /proc, services, etc.
  • Comprehensive understanding of networking concepts, including layers, firewalls, DNS, VPN, etc.
  • Proficiency with configuration management tools, such as Chef or Puppet, and fluency with at least one major scripting language.
  • Experience building distributed, failure-resistant architecture, including disaster recovery, backups, failover, etc.
  • Experience with the advanced featured of public cloud platforms such as AWS or Azure (we use AWS).
  • Familarity with large scale monitoring and analysis systems, such as ELK or Splunk (we use ELK).
  • Complete understanding of how to build secure infrastructure and an awareness of common server security vulnerabilities.
  • Ability to manage a large database at scale (we use MySQL).
  • History developing and supporting actual infrastructure that has seen production usage at equal to or greater than our scale. (We talk about our size earlier in the post.)