Site Reliability Engineer jobs

Make Tumblr fast, reliable and available for hundreds of millions of users all over the world. As a site reliability engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. What You’ll Do: Manage the availability, scalability and performance of Tumblr platforms Create the tools and infrastructure leveraged by the rest of the Tumblr engineering teams Diagnose and repair network, application, and hardware bottlenecks Test and tune network, hardware, and software configurations to maximize performance Deploy and manage monitoring and diagnostic tools Guide our product and platform teams to keep new features fast and stable What We’re Looking For: Experience scaling high-traffic web sites Experience with Unix systems administration including solid scripting skills in Ruby, PHP or Python Expertise in data structures and algorithms Expertise in troubleshooting large-scale distributed systems Smarts, humility, and equal willingness to learn and teach A sense of ownership, initiative, and drive Tools We Like: Nginx, Varnish and HAProxy Memcached and Redis MySQL (InnoDB) Puppet PHP5 at its furthest extent git and GitHub Ruby, Scala and PHP Asynchronous services and queues Hadoop, Pig, ZooKeeper, and other Java/JVM projects Nagios/Icinga, OpenTSDB

Desired Skills and Experience

See application page for details