The Cloud SRE(Service Reliability Engineering)team works within Comcast’s Technology and Product Division to deploy, operate, and improve the Comcast private cloud and the applications that run on the private cloud.

The Cloud SRE(Service Reliability Engineering)team workswithin Comcast’s Technology and Product Division to deploy, operate, and improve the Comcast private cloud and the applications that run on the private cloud.

Do you want to move beyond simply using cloud technology and, instead, help build and improve a large-scale, state-of-the-art cloud? How about one with nearly 800TB of memory and over 1 million vCPUs?

As an SRE on the Cloud SRE team, you will help build and improve a private cloud at that scale, digging deep into the internals of our open-source toolset to help create a resilient, scalable cloud platform used by millions of Comcast customers.

The Cloud SRE team at Comcast views large-scale platform challenges as software development opportunities. We work with the OpenStack community, contributing code back to the community as we develop new capabilities, and with Comcast application development teams, helping developers deliver applications faster. We use tools like Ansible and applications written in Python and Go to rapidly deploy new cloud capabilities and gain insight into the inner workings of our cloud.

In your role, you will not only help deploy the cloud, but help accelerate the pace of deployment and gain greater operational awareness of the running cloud. This includes building new tools on top of the OpenStack APIs to do things like collect real-time metrics from the tens of thousands of VMs using the cloud. You will create new metrics and identify monitoring deliverables to improve site reliability. Participate in proof-of-concept evaluations of cutting edge tools to recommend new practices to maximize cloud efficiency. The platform that you will build and constantly improve hosts hundreds of applications spanning many data centers across the country. These applications form the backbone of the X1 Entertainment Operating System and other mission-critical applications used by millions of Comcast customers across the country. Messaging, data analysis, and real-time web applications all run on our platform.

Responsibilities:

Here are some of the specific technologies we use:

Skills & Requirements

Comcast is an EOE/Veterans/Disabled/LGBT employer

Desired Skills and Experience

  • OpenStack
  • Ansible
  • Python
  • Shell scripting
  • Ubuntu Linux
  • MariaDB
  • RabbitMQ
  • Hands-on experience with OpenStack in an SRE role, Kilo or later.
  • 5+ years of software development experience
  • Hands-on experience with Ansible. Experience with OpenStack Ansible a plus.
  • Solid Unix system administration experience.
  • Understanding of infrastructure-as-code.
  • Understanding of DevOps concepts.
  • Understanding of large-scale distributed systems.