Site Reliability Engineer

What you can expect to work on in this role Work with other engineering teams to Scale existing applications to handle increased #webscale Bring new systems online, such as containerized ruby applications, redis clusters, or redshift servers Improve shared libraries and practices around authentication, logging, alerting, and monitoring Work primarily within our Tech Ops team to Enhance infrastructure security, including VPC hardening and Security Group modifications using Ansible and Terraform Manage network configurations within our AWS VPC and VPN Strengthen our DDoS and intrusion prevention systems via WAF injection, ELB configuration, and auto-scaling Site Reliability Engineers are primarily responsible for Linux system administration, platform engineering, and infrastructure management. Answer difficult questions like What’s the right composition of tools (e.g., Docker, Docker Swarm, Consul, Vault, AWS ECS, Apache Mesos) to orchestrate application deployment and management? What CIDR should I pick for my VPC in us-west? How many distinct subnets should it have? You have 4 cloud instances that need to have high-read, low-write access to the same filesystem. How should you achieve that while ensuring HA? You’ll be effective if you Treat data integrity and confidentiality as a first-class citizen in everything you build Have ample experience with cloud providers (e.g. AWS or DigitalOcean) Live and breath continuous delivery automation Are comfortable with a variety of scripting languages like Bash, Ruby, Python Have experience with deployment automation tools like Ansible or Puppet Have experience with application load balancing solutions for scaling and High Availability Thrive in a startup environment

Desired Skills and Experience

See application page for details