Cloud Operations Engineer

With Prometheus Group in Raleigh NC US

More jobs from Prometheus Group

Posted on November 07, 2019

About this job

Job type: Full-time
Role: System Administrator


cloud, saas, amazon-web-services

Job description

Cloud Operations Engineer

Job Description

The Cloud Operations Engineer—as a member of the cloud ops team—is responsible for the design, implementation, and sustainable operation of all cloud infrastructure that supports Prometheus Group's SaaS offerings. Cloud operations engineers share ownership of customer-facing infrastructure and applications as well the processes and procedures associated with operating them. Will work closely with software architects, dev ops engineers, and systems administrators to ensure continuity, availability, and security of all the SaaS products. Reports to the Director of Information Technology.

Daily Responsibilities Include

- Active participation in an on-call rotation

  • Design, test and deliver a secure cloud runtime to support the full Prometheus Group stack

  • Maintain core application infrastructure that supports customer-facing environments

  • Manage and own both production and non-production environments for use in application delivery, debugging, and testing

  • Diagnose and correct issues related to health, reliability, and scale to address customer impacting issues

  • Apply change management procedures to guarantee security and operational continuity of the environment

  • Author and execute operating procedures and guidelines for the cloud environment

  • Ensure that documentation of process and architecture are kept consistent with the delivered system to ensure auditability, scalability, and repeatability

  • Participate in frequent internal facing engineering activities with product development, implementation, and IT

  • Participate in occasional external facing engineering activities in partnership with customers’ engineering teams and auditors

  • Ensure SLA compliance for customer systems

  • Ensure compliance with safe data handling procedures as defined by policy.


  • Bachelor’s degree in computer science or similar engineering discipline

  • 2-5 years of experience in SaaS operations, cloud engineering, or closely related role

  • Experience with AWS, GCP, or Azure core technologies (database and application servers, monitoring tools, etc)

  • Prior on-call operations experience


  • Experience with production Kubernetes or other container orchestration technologies

  • Experience with Infrastructure as Code (ex. Ansible, Terraform, CloudFormation)

  • Experience implementing GitOps in a multi-product environment.

  • Monitoring and Reporting (ex. and Grafana, Datadog, New Relic)

  • Public Key Infrastructure design and management (TLS certificates, Amazon certificate services)

Apply here