Site Reliability Engineer

With Zattoo in Berlin - DE

More jobs from Zattoo

Posted on January 07, 2022

About this job

Job type: Full-time
Experience level: Mid-Level
Role: DevOps, System Administrator
Industry: Broadcast, IPTV, Software Development / Engineering
Company size: 201-500 people
Company type: Private


kubernetes, docker, nginx, jenkins

Job description

The Role

At Zattoo we are building the TV platform of the future. To make that possible, we are looking for Site Reliability Engineer to join our Operations team. As the demand for unicast TV delivery is constantly growing, we are scaling out our custom-built delivery infrastructure to serve linear and non-linear video data on a multi Tbps scale. Because we control the whole chain from ingest through encoding/transcoding, to packaging and delivery there are many exciting areas to work on and to push TV to a new level.

You will play a key role in optimizing our systems architecture, monitoring and alerting. You will be working closely together with the core video, core middleware and SRE/Ops teams to ensure maximum quality of our service to our customers. If you have a strong interest in monitoring, scaling out and optimizing complex distributed systems, you can have a huge impact on site performance and network optimization at Zattoo.

What You'll Do

Become a valued member of Zattoo’s Ops/SRE team and improve our core services

Increase optimization and automation of our setup in various areas such as monitoring, alerting, performance and profiling

Develop tools and software for the TOC (TV Operations Center) to monitor and control our services 24/7

Analyse and understand Zattoo’s core services and explore ways on how to efficiently scale them

Propose improvements to platforms, infrastructure, tools and processes

Collaborate, support and consult engineers to write code that performs well and scales

Advocate security and stability, raise awareness for weak spots and develop plans to mitigate them

What You'll Bring

2+ years proven experience in a Site Reliability Engineering or Ops position, ideally operating a complex web-based service

Strong experience in Monitoring preferably working experience with Prometheus or similar technologies

Experience in container management such as Kubernetes, Docker or LXC w/o using cloud providers such as AWS or Google Cloud

Experience in working with standard Ops tools, we use Debian, Nginx, Puppet, Jenkins

Comfortable working with remote colleagues, multidisciplinary teams and external partners

Fluent verbal and written English language skills

Bonus: A BS/MS degree in computer science or similar discipline

Bonus: Basic understanding of programming languages such as Bash, Python, Ruby and/or Go based web frameworks

Bonus: Good understanding of Datastores (elasticsearch, cassandra, redis, memcached, mysql, database replication/failover)

Apply here