Desired Skills and Experience
- Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Deployment (CI/CD) practices
- Devise automation strategies for upcoming releases while continuously modernizing existing systems
- Create and manage local development environments for complex software systems
- Work closely with Architects and Engineers to define build pipelines, administer artifact repositories and automate test tooling
- Champion best practice methodologies for packaging and distributing web-scale applications
- Assist the execution of performance, stress and security test plans
- Assess release readiness of features from an operational perspective
- Ensure smooth and optimal production rollouts of provisioning, deployment, configuration, monitoring and other day-to-day operational activities
- Manage and own testing, staging and production infrastructure and cloud resources (DNS, firewalls, bastion hosts, proxies, load balancers etc)
- Manage ongoing operations of SQL and NoSQL databases like MySQL, Redis and Cassandra
- Spearhead implementation of predictive service monitoring solutions to proactively detect problems, identify root causes and manage SLAs
- Promote operational best practices for infrastructure dynamism including automated capacity modeling, Service discovery, Containerization and Auto scalingÂ
- Strong automation design skills
- Strong understanding of cloud infrastructure providers – AWS, GCP, Rackspace, Digital Ocean etc.
- Strong understanding of networking concepts, protocols, and security (TCP/IP, UDP, HTTP, NTP, DNS, TLS etc)
- Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible
- Significant experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems, no sql databases etc.
- Significant experience with Open source and Commercial Application Performance Monitoring and Log analysis stacks
- Experience with implementing and using Continuous delivery design patterns and tool chains
- Expert level knowledge of shell scripting
- Working knowledge of Python/Go
- Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
- Attention to detail and ability to work independently on complex problems
- 3-5 years experience in a Site Reliability role
- BS or MS in Computer Science or related technical field.
- Competitive salary and pre-IPO stock options
- Great culture and team spirit
- Generous paid medical/dental/vision coverage, plus medical and dependent FSA
- Newly renovated offices on the south side of Moscone, with easy access to public transportation
- Catered lunch three times a week
- Fully stocked break room with unlimited drinks and snacks
- Onsite games and table sports for when you need a mini-break
- Opportunities to socialize through weekly onsite events
- Gym membership in building
- Pre-tax public transportation allowance
- Team outings and holiday party