Desired Skills and Experience

  • Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Deployment (CI/CD) practices
  • Devise automation strategies for upcoming releases while continuously modernizing existing systems
  • Create and manage local development environments for complex software systems
  • Work closely with Architects and Engineers to define build pipelines, administer artifact repositories and automate test tooling
  • Champion best practice methodologies for packaging and distributing web-scale applications
  • Assist the execution of performance, stress and security test plans
  • Assess release readiness of features from an operational perspective
  • Ensure smooth and optimal production rollouts of provisioning, deployment, configuration, monitoring and other day-to-day operational activities
  • Manage and own testing, staging and production infrastructure and cloud resources (DNS, firewalls, bastion hosts, proxies, load balancers etc)
  • Manage ongoing operations of SQL and NoSQL databases like MySQL, Redis and Cassandra
  • Spearhead implementation of predictive service monitoring solutions to proactively detect problems, identify root causes and manage SLAs
  • Promote operational best practices for infrastructure dynamism including automated capacity modeling, Service discovery, Containerization and Auto scaling 
  • Strong automation design skills
  • Strong understanding of cloud infrastructure providers – AWS, GCP, Rackspace, Digital Ocean etc.
  • Strong understanding of networking concepts, protocols, and security (TCP/IP, UDP, HTTP, NTP, DNS, TLS etc)
  • Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible
  • Significant experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems, no sql databases etc.
  • Significant experience with Open source and Commercial Application Performance Monitoring and Log analysis stacks
  • Experience with implementing and using Continuous delivery design patterns and tool chains
  • Expert level knowledge of shell scripting
  • Working knowledge of Python/Go
  • Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
  • Attention to detail and ability to work independently on complex problems
  • 3-5 years experience in a Site Reliability role
  • BS or MS in Computer Science or related technical field.
  • Competitive salary and pre-IPO stock options
  • Great culture and team spirit
  • Generous paid medical/dental/vision coverage, plus medical and dependent FSA
  • Newly renovated offices on the south side of Moscone, with easy access to public transportation
  • Catered lunch three times a week
  • Fully stocked break room with unlimited drinks and snacks
  • Onsite games and table sports for when you need a mini-break
  • Opportunities to socialize through weekly onsite events
  • Gym membership in building
  • Pre-tax public transportation allowance
  • Team outings and holiday party