Site Reliability / DevOps Engineer at PubNub (San Francisco, CA)

Desired Skills and Experience

Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Deployment (CI/CD) practices
Devise automation strategies for upcoming releases while continuously modernizing existing systems
Create and manage local development environments for complex software systems
Work closely with Architects and Engineers to define build pipelines, administer artifact repositories and automate test tooling
Champion best practice methodologies for packaging and distributing web-scale applications
Assist the execution of performance, stress and security test plans
Assess release readiness of features from an operational perspective
Ensure smooth and optimal production rollouts of provisioning, deployment, configuration, monitoring and other day-to-day operational activities
Manage and own testing, staging and production infrastructure and cloud resources (DNS, firewalls, bastion hosts, proxies, load balancers etc)
Manage ongoing operations of SQL and NoSQL databases like MySQL, Redis and Cassandra
Spearhead implementation of predictive service monitoring solutions to proactively detect problems, identify root causes and manage SLAs
Promote operational best practices for infrastructure dynamism including automated capacity modeling, Service discovery, Containerization and Auto scaling
Strong automation design skills
Strong understanding of cloud infrastructure providers – AWS, GCP, Rackspace, Digital Ocean etc.
Strong understanding of networking concepts, protocols, and security (TCP/IP, UDP, HTTP, NTP, DNS, TLS etc)
Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible
Significant experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems, no sql databases etc.
Significant experience with Open source and Commercial Application Performance Monitoring and Log analysis stacks
Experience with implementing and using Continuous delivery design patterns and tool chains
Expert level knowledge of shell scripting
Working knowledge of Python/Go
Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
Attention to detail and ability to work independently on complex problems
3-5 years experience in a Site Reliability role
BS or MS in Computer Science or related technical field.
Competitive salary and pre-IPO stock options
Great culture and team spirit
Generous paid medical/dental/vision coverage, plus medical and dependent FSA
Newly renovated offices on the south side of Moscone, with easy access to public transportation
Catered lunch three times a week
Fully stocked break room with unlimited drinks and snacks
Onsite games and table sports for when you need a mini-break
Opportunities to socialize through weekly onsite events
Gym membership in building
Pre-tax public transportation allowance
Team outings and holiday party