Site Reliability Engineer
Make Tumblr fast, reliable and available for hundreds of millions of users all over the world. As a site reliability engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. What You’ll Do:
- Manage the availability, scalability and performance of Tumblr platforms
- Create the tools and infrastructure leveraged by the rest of the Tumblr engineering teams
- Diagnose and repair network, application, and hardware bottlenecks
- Test and tune network, hardware, and software configurations to maximize performance
- Deploy and manage monitoring and diagnostic tools
-
Guide our product and platform teams to keep new features fast and stable What We’re Looking For:
- Experience scaling high-traffic web sites
- Experience with Unix systems administration including solid scripting skills in Ruby, PHP or Python
- Expertise in data structures and algorithms
- Expertise in troubleshooting large-scale distributed systems
- Smarts, humility, and equal willingness to learn and teach
-
A sense of ownership, initiative, and drive Tools We Like:
- Nginx, Varnish and HAProxy
- Memcached and Redis
- MySQL (InnoDB)
- Puppet
- PHP5 at its furthest extent
- git and GitHub
- Ruby, Scala and PHP
- Asynchronous services and queues
- Hadoop, Pig, ZooKeeper, and other Java/JVM projects
- Nagios/Icinga, OpenTSDB
Desired Skills and Experience
See application page for details