Make Tumblr fast, reliable and available for hundreds of millions of users globally. As an SRE-Ops Engineer you are a software developer, systems maven, with a love of highly performant, fault-tolerant, massively distributed systems. What You’ll Do:

  • Manage the availability, scalability and performance of Tumblr platforms
  • Create the tools and infrastructure leveraged by the rest of the Tumblr engineering teams
  • Diagnose and repair network, application, and hardware bottlenecks
  • Test and tune network, hardware, and software configurations to maximize performance
  • Deploy and manage monitoring and diagnostic tools
  • Guide our product and platform teams to keep new features fast and stable
  • Front-line defense on a daily rotation 10am-6pm (approximately 1 day per week)
  • Front-line defense on a weekly overnight rotation 12am-10am (1 week at a time, no daytime work during rotation) What We’re Looking For:

  • Hunger to solve the problem. No stone left unturned while searching for the solution!
  • Experience in troubleshooting large-scale distributed systems
  • Experience scaling high-traffic web sites
  • Experience with Unix systems administration, including solid scripting skills
  • Experience in data structures and algorithms
  • Experience and willingness to perform on-call duties
  • Smarts, humility, and equal willingness to learn and teach
  • A sense of ownership, initiative, and drive Tools We Like:

  • Nginx, Varnish and HAProxy
  • Memcached and Redis
  • MySQL (InnoDB)
  • Puppet
  • git and GitHub
  • Ruby, Go, Scala, PHP
  • Asynchronous services and queues like Oozie and Gearman
  • Hadoop, Pig, ZooKeeper, and other Java/JVM projects
  • Nagios, Icinga2, Pagerduty, OpenTSDB
  • OpenStack, Docker, Mesos

Desired Skills and Experience

See application page for details