Desired Skills and Experience

  • Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
  • Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
  • Assisting in the architectural design of new services and making them operate at scale
  • Monitoring of systems, services and service clusters, optimization of performance and resource utilization
  • Assisting in or lead incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure
  • Share our values and work in accordance with them
  • 3+ years experience in an SRE/Operations/DevOps role as part of a team
  • Experience with managing geographically distributed, highly available, high traffic infrastructure based on Linux
  • Comfortable with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.)
  • Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (Icinga/Nagios, Prometheus, Grafana, Graphite, Logstash/Kibana, etc.)
  • Comfortable with shell and scripting languages used in an SRE/Operations engineering context (Python, Go, Bash, Ruby, etc.)
  • Comfortable with managing remotely both bare-metal servers and virtualized environments
  • Experience with software and service deployment and package management, including (Debian) packaging as well as container systems
  • Aptitude for automation and streamlining of tasks
  • Strong English language skills and ability to work independently, as an effective part of a globally distributed team
  • B.S. or M.S. in Computer Science or equivalent work experience
  • Track record of open source contributions is a major plus
  • Familiarity  with modern distributed container cluster management systems (Kubernetes, Docker Swarm, Mesos, …)
  • Experience with LAMP stack technologies (PHP/HHVM, memcached/Redis, MySQL) - MediaWiki experience is a definite plus
  • Low level systems troubleshooting and debugging (CPU/memory profiling, C/C++ experience, in-depth Linux knowledge)
  • Experience with advanced distributed storage and database systems (Swift, Ceph, Cassandra, etc.)
  • Design, implement and maintain backup and underlying storage infrastructure, ensuring all Wikimedia mission-critical data is backed up to on-site and off-site storage in an automated, consistent and reliable manner
  • Ensure smooth and reliable operation of the MediaWiki application server platform and its dependencies (Memcached, Redis, etcd, …)
  • Perform platform transformations and migrations towards modernized infrastructure (HHVM to Zend PHP7, bare metal deployments to Kubernetes clusters, active/active multi-data center support, etc.)
  • Design, implement and maintain our metrics, monitoring and logging infrastructure using modern and state-of-the-art tooling (Prometheus, Grafana, Logstash/Kibana)
  • Implement and improve orchestration and automation tooling that eliminates toil and acts as an enabler for the entire SRE team
  • Help keep Wikimedia’s infrastructure secure in an ever-changing, high-velocity environment with staff and volunteers across the world
  • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
  • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more
  • The 401(k) retirement plan offers matched contributions at 4% of annual salary
  • Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.
  • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.
  • For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program
  • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
  • Telecommuting and flexible work schedules available
  • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
  • Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

Apply