Shmoop (www.shmoop.com) is a digital curriculum company that makes learning, teaching, and test prep materials that are * get this * smart and fun. We are on the hunt for a Site Reliability Engineer that is ready to jump in and get started!

Responsibilities

You will drive proactive projects to reduce risk or make our systems easier to manage and use. You own, based on your specific experience, our deployment tools & processes, configuration management, performance testing and tuning, security, capacity planning, logging systems, DBA & sysadmin duties, and operational processes.

In Addition, You’ll Ensure We React To Incidents Appropriately

  • Implementing appropriate monitoring and alerting to shorten the mean time to response and recovery (MTTR)
  • Responding to and resolving issues we encounter
  • Learning from and addressing the root cause(s) in order to avoid similar issues in the future
  • Communicating to key folks during and after the incident informing them of the incident, impact, and resulting plans & actions You are the go-to person for all things site related and this doesn’t scare you in the least. This is your adrenaline rush.

Required Qualifications

  • Have performed technical operations duties for a high-traffic, transactional site or platform.
  • Know your way around a unix shell and are confident in your scripting abilities.
  • Familiar with the entire web stack: frontend, the application layer, caching, and databases.
  • Experienced with monitoring & alerting tools and techniques to reduce mean time to recovery (MTTR).
  • Experienced with configuration management, CDNs, load balancers.
  • You’re curious, humble, and able to learn.
  • Strong communication & interpersonal skills. Bonus points if you have

  • Bachelor’s degree in Computer Science or equivalent.
  • Experience with AWS.
  • Experience with Python.
  • Startup experience.

Desired Skills and Experience

See application page for details