Site Reliability / DevOps Engineer at PubNub (San Francisco, CA)
Job Summary
The Engineering team is responsible for designing, developing, operationalizing, sustaining and scaling PubNub’s Data Stream Network. This includes our secure, distributed messaging bus as well as all add-on services and data pipelines including Storage/Playback, Presence, Access Management, Push Gateways and more.
We are a strong team of Engineers and Architects who are low on drama and high on results. Our mission is to use uptime, performance and scale as tools, to extend the fabric of real-time possibilities and stay true to the trust and confidence reposed in us by customers to deliver delightful user experiences.
We believe that teams, which balance linear progression and non-linear innovation, achieve the best results. Consequently, we place the team ahead of the individual when solving problems and celebrating achievements. If you are on a journey to seek a team whose norm is to swarm and perform, we are your destination!!
As a DevOps Engineer, you would be directly responsible for championing tools and technologies; adopting and adapting frameworks and services used/needed by current and forward looking features. In net terms, you’ll serve as a productivity multiplier for other developers in the team.
Responsibilities
Desired Skills and Experience
- Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Deployment (CI/CD) practices
- Devise automation strategies for upcoming releases while continuously modernizing existing systems
- Create and manage local development environments for complex software systems
- Work closely with Architects and Engineers to define build pipelines, administer artifact repositories and automate test tooling
- Champion best practice methodologies for packaging and distributing web-scale applications
- Assist the execution of performance, stress and security test plans
- Assess release readiness of features from an operational perspective
- Ensure smooth and optimal production rollouts of provisioning, deployment, configuration, monitoring and other day-to-day operational activities
- Manage and own testing, staging and production infrastructure and cloud resources (DNS, firewalls, bastion hosts, proxies, load balancers etc)
- Manage ongoing operations of SQL and NoSQL databases like MySQL, Redis and Cassandra
- Spearhead implementation of predictive service monitoring solutions to proactively detect problems, identify root causes and manage SLAs
- Promote operational best practices for infrastructure dynamism including automated capacity modeling, Service discovery, Containerization and Auto scaling
- Strong automation design skills
- Strong understanding of cloud infrastructure providers – AWS, GCP, Rackspace, Digital Ocean etc.
- Strong understanding of networking concepts, protocols, and security (TCP/IP, UDP, HTTP, NTP, DNS, TLS etc)
- Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible
- Significant experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems, no sql databases etc.
- Significant experience with Open source and Commercial Application Performance Monitoring and Log analysis stacks
- Experience with implementing and using Continuous delivery design patterns and tool chains
- Expert level knowledge of shell scripting
- Working knowledge of Python/Go
- Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
- Attention to detail and ability to work independently on complex problems
- 3-5 years experience in a Site Reliability role
- BS or MS in Computer Science or related technical field.
- Competitive salary and pre-IPO stock options
- Great culture and team spirit
- Generous paid medical/dental/vision coverage, plus medical and dependent FSA
- Newly renovated offices on the south side of Moscone, with easy access to public transportation
- Catered lunch three times a week
- Fully stocked break room with unlimited drinks and snacks
- Onsite games and table sports for when you need a mini-break
- Opportunities to socialize through weekly onsite events
- Gym membership in building
- Pre-tax public transportation allowance
- Team outings and holiday party