Site Reliability Engineer
Backcountry’s engineers are aggressive about finding new technologies to make systems better, faster, more stable, and more interesting to work on. Any developer in the company can get passionate about something new, start using it, and rally others around putting it into production. SREs at Backcountry play a crucial role in helping facilitate and provide structure to this innovation. We vet technologies for production readiness, provide deployment guidelines and support the operational environments. We strive to do this in a collaborative way with our Engineering counterparts as we work toward migrating the organization to a DevOps culture. About the Job
- Full stack ownership databases, web servers, application servers and caching services.
- Write and review code to provision servers, monitor systems and integrate ops workflows.
- Develop system documentation and capacity plans.
- Debug hard problems on our live production systems data, hardware, software, application and network.
- Monitor system health and capacity planning and take proactive action to fix problems before they impact users.
- Participate in an oncall rotation and serve as an escalation contact for incidents.
-
Collaborate with Development Engineering teams to build, deploy, support, and maintain new functionality. About You
- 5+ Years experience managing Linux/Unix systems, with demonstrable knowledge of operating system internals, file systems and fullstack troubleshooting.
- 5+ Years experience managing and working with Databases, with demonstrable knowledge of at least one of the following: Oracle, Postgresql.
- Ability to code really well in at least one high level language (Perl/Python/PHP/Ruby/Java/C#).
- Ability to rapidly learn new development languages, software, technologies, frameworks and APIs.
- Practical knowledge of shell scripting.
- Understand and extensive experience using a DVCS (we use git).
- Extreme troubleshooting skills identify problems and a solution, then roll it out.
- Ability to perform guerrilla capacity planning for internet services architecture, web scale, etc.
- Solid knowledge of basic large scale internet service architectures (like LAMP, CDNs, clusters).
- Configuration and maintenance of common applications used in internet infrastructure such as: Apache, Tomcat, Nginx, varnish, mySQL, postgres, mongodb, NFS, DHCP, SSH, MemcacheD,DNS, SNMP.
- Understanding of ‘the cloud’ and cloud like services, IAAS, PAAS, etc.
- Strong communication and collaboration skills.
- Ability to prioritize tasks and work independently as well as in a team.
Desired Skills and Experience
See application page for details