Desired Skills and Experience

  • Lead high-complexity projects from scoping to deployment to production.
  • Develop effective tools, alerts, and responses to identify and address reliability risks.
  • Work closely with search engineers to triage production issues and determine appropriate remediation including code changes and performance considerations.
  • Share on-call responsibilities – collaborating with other engineers to triage and fix reliability issues that come up in production and autonomously put out fires that may come up.
  • Help determine the future technical direction of our deployment with an effort to improve reliability and performance.
  • Significant experience as a site reliability engineer, including on-call responsibilities (around 5+ years).
  • Ability to root cause sources of instability of high-traffic, distributed systems.
  • Experience with configuration and troubleshooting of Linux and NGiNX.
  • Strong understanding of reliability challenges of large-scale deployments.
  • Moderate to advanced programming experience preferably in a high level language like Perl or Python.
  • Effective project management skills.
  • Strong decision makers. You can make a decision when faced with competing priorities and limited information.
  • Someone interested in the why, not just the how. You like to analyze situations and won’t be satisfied with a shallow analysis.
  • Creative problem solvers and risk takers. You like to take initiative in pushing a project forward but can make adjustments based on team feedback.
  • Someone who will put the user first in on-call and project work.
  • Strong communication skills. You can validate and communicate your decisions clearly.
  • We are a small, remote team in different time zones and communicate with a variety of tools throughout the day. You should feel comfortable with the intricacies of this type of work situation.
  • Sometimes we meet up!  You can expect to travel at least 2x a year: once for our all-hands meetup and another for a team retreat (each ~4-5 days)
  • We want to have a major impact on raising the standard of trust online.  To do this, we believe in a focused approach, with company-wide objectives, and with each team member working on a single top priority at a time.
  • Our work philosophy is built upon empowered project management. All team members have opportunities to run projects.
  • All projects are run transparently, and we encourage everyone to participate in areas of interest throughout the company.  Anyone and everyone can (and should) ask questions and offer feedback around the product and internal projects.
  • We try to exemplify our values (build trust, question assumptions, and validate direction) in everything we do.

Apply