We are seeking a Senior Site Reliability Engineer to join our team in Venice, CA! ZEFR Site Reliability Engineers solve operational problems with a software engineering mindset, using prowess with coding and knowledge of AWS to enable Engineering productivity and support application releases. You will be experienced in monitoring products, responding to roadblocks, and participating in product deployments.

Here’s what you’ll do:

Desired Skills and Experience

  • Utilize production experience with Amazon Web Services
  • Implement automation using a combination of Mesos, Docker, Consul, AWS, Ansible/Puppet/SaltStack and Linux command line tools
  • Create an effective monitoring plan for products
  • Maintain the health of production environments proactively 
  • Respond to system performance issues and outages
  • Be effective and rapid at troubleshooting with root cause analysis
  • Participate in change management and deployment plan creation and review
  • Improve build and continuous integration (Jenkins), testing and deployment pipelines
  • Help to build a culture of solving Operations problems with Software Engineering discipline
  • Participate in an oncall rotation
  • Write code
  • Bachelor Degree in Computer Science or related field or equivalent work experience
  • Kinesis and or Kafka
  • Python, Golang, bash scripting language
  • 5+ years building web-scale systems as a software or systems engineer
  • Must have strong Linux skills in most major environments
  • Experience with one of Puppet, Ansible, Chef, SaltStack, etc.
  • Docker experience
  • Desire and ability to work independently and take ownership of complex tasks
  • Strong written and oral communication, organization, and documentation skills
  • Python (Flask), JavaScript (Node, ES6, Coffeescript), GoLang, Elixir
  • React, AngularJS, HTML5/CSS3, Bootstrap
  • Linux/Unix, PostgresSQL, Redshift, Cassandra, Dynamo DB, Redis, Docker
  • AWS Services: ECS, SQS, Kinesis, SNS