Site Reliability Engineer (remote) at Campaign Monitor (San Francisco, CA)
Campaign Monitor is seeking a Site Reliability Engineer to join our growing SRE team, and you will work remotely in the US. You’ll work on automating and scaling our systems for ever-increasing growth. We send over 2 billion emails every month and our infrastructure needs to scale accordingly so we can deliver the best user experience possible.
You’re smart, personable and friendly, and you communicate clearly and respectfully. You live and breathe problem solving related to mission-critical services and are passionate about learning challenges and trends within Site Reliability.
What does a Site Reliability Engineer at Campaign Monitor do?
Desired Skills and Experience
- Solve problems relating to mission-critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions. Facilitate root cause analysis sessions and communicate the findings back to the product teams.
- Own end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence; eventually automate response to all non-exceptional service conditions. Create visibility on how we perform against our SLA through active monitoring and reporting.
- Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of Campaign Monitor’s services.
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
- Engage in service capacity planning and demand forecasting, software performance analysis, and system tuning.
- Conduct periodic on-call duties using a follow-the-sun model.
- Measure everything, report on interesting events, and alert on critical issues.
- Create and update process documentation, playbooks, and incident reports.
- Work with other teams to build, test and roll out system
- Computer Science or related degree, or several years of relevant industry experience
- Fluency in at least two programming languages (C#, Go, C++, Java and JavaScript) and strong scripting skills.
- You’re comfortable working from the command line, in fact, using a GUI is for amateurs.
- You’ve used a range of storage engines (relational databases, Elasticsearch, Cassandra etc) and know when each type is useful.
- Experience with a public cloud provider, such as AWS
- All your infrastructure is code, you’re experienced with a configuration management tool (Ansible, Salt, etc)
- You can use a DVCS like Git or Mercurial
- You know how web applications work, from the underlying network protocols (HTTP, TCP) through to web server (IIS, nginx), browser behaviour and everything in between
- You know how to use DevTools or similar to improve web application performance
- Strong knowledge of TCP/IP and UDP networking and troubleshooting with Wireshark, nmap and friends
- Effective communication skills, via interactive mediums and documentation
- Big data systems such a Elasticsearch, Cassandra or Hadoop
- Distributed data storage systems like HDFS
- Competitive salary + equity + medical/dental/vision benefits
- Brand new Macbook Air with all the necessary accessories
- Adjustable desks so you’re comfy
- Daily catered meals and loads of snacks and drink options
- Weekly happy hours involving Corn Hole/Ping Pong Tourneys and super fun, frequent team events
- Flexible work hours and great vacation (we believe in the importance of work-life/personal-life balance)
- Paid time off to volunteer in our community
- Training budget to make sure you’re always learning and growing