Site Reliability / DevOps Senior Manager / Director at PubNub (San Francisco, CA)
Job Summary:
The Engineering/DevOps team is responsible for designing, developing, operationalizing, sustaining and scaling PubNub’s Data Stream Network. This includes our secure, distributed messaging bus as well as all add-on services and data pipelines including Storage/Playback, Presence, Access Management, Push Gateways and more.
We are a strong team of Engineers and DevOps who are low on drama and high on results. Our mission is to use deliver high uptime, performance and scale, to our real-time network projecting trust and confidence to our customers. Whilst delivering innovation and high customer loyalty through a remarkable experience with our service. If you are on a journey to seek a team whose norm is to swarm and perform, we are your destination!!
As an DevOps Leader, you would be directly responsible for championing tools and technologies; adopting and adapting frameworks and services used/needed by current and forward looking features.
Responsibilities
Desired Skills and Experience
- Reporting to the VP Engineering/Ops you will have responsibility for our devops/operations strategy.
- Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Delivery/Deployment (CI/CD) practices.
- Devise automation strategies for upcoming releases while continuously modernizing existing systems.
- Create and manage local development environments for complex software systems.
- Manage and own pre-production (testing/staging) infrastructure and cloud resources (DNS, firewalls, proxies, load balancers, databases, monitoring systems etc.).
- Create and manage production environments for complex software systems.
- Work closely with Architects and Engineers to define and build pipelines, administer artifact repositories and automate test tooling.
- Champion best practice methodologies for packaging and distributing web-scale applications.
- Assist the execution of performance, stress and security testing.
- Assess release readiness of features from an operational perspective.
- Ensure system reliability through monitoring and other day-to-day operational activities.
- Manage reporting, monitoring and alerting metrics.
- Lead Change management and incident management processes.
- Lead tool selection for production, automation and process.
- Minimum criteria
- Proven skills and background in running a real-time highly scaled system managing billions of transactions a month.
- Expertise in AWS cloud infrastructure.
- Experience of Cassandra deployments in the order of 72 nodes or greater.
- Strong understanding of networking concepts, protocols, and security
- Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible.
- Container technology such as Docker, Mesos etc.
- Past experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems etc.
- Experience implementing Change management and incident management tools and process.
- Advantageous
- Strong automation design skills
- Experience with implementing and using Continuous Integration and Delivery (CI/CD) design patterns and tool chains
- Expert level knowledge of shell scripting
- Working knowledge of Python
- Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
- Attention to detail and ability to work independently on complex problems
- BS or MS in Computer Science or related technical field.
- Competitive salary and pre-IPO stock options
- Great culture and team spirit
- Generous paid medical/dental/vision coverage, plus medical and dependent FSA
- Newly renovated offices on the south side of Moscone, with easy access to public transportation
- Catered lunch three times a week
- Fully stocked break room with unlimited drinks and snacks
- Onsite games and table sports for when you need a mini-break
- Opportunities to socialize through weekly onsite events
- Gym membership in building
- Pre-tax public transportation allowance
- Team outings and holiday party