Site Reliability Engineer at Zattoo (Berlin, Deutschland)

At Zattoo we are building the TV platform of the future. To make that possible, we are looking for Engineers to join our SRE and infrastructure team. As the demand for unicast TV delivery is constantly growing, we are scaling out our custom built delivery infrastructure to serve linear and non-linear video data on a multi Tbps scale. Because we control the whole chain from ingest through encoding/transcoding, to packaging and delivery, there are many exciting areas to work on and to push TV to a new level.

This position will play a key role in optimizing our systems architecture, monitoring and alerting. You will be working closely together with the core video, core middleware and SRE/Ops teams to ensure maximum QoE to our customers.

If you have a strong interest in monitoring, scaling out and optimizing complex distributed systems, you can have a huge impact on site performance and network optimization at Zattoo.

Desired Skills and Experience

Become a valued member of Zattoo’s Ops/SRE team and operate, monitor, troubleshoot and improve our core services
Drive optimization and automation of our setup in various areas such as monitoring, alerting, performance and profiling
Help to build tools and software for the TOC (TV Operations Center) to monitor and control our services 24/7
Get a deep understanding of Zattoo’s services and explore ways on how to efficiently scale it
Help making decisions on platforms, infrastructure, tools and processes
Support and consult engineers to write code that performs well and scales
Be an advocate for security and stability, raise awareness for weak spots and develop plans to mitigate them
Help us building the future of TV
2+ years proven experience in a Site Reliability Engineering or Ops position, ideally operating a complex web based service
Fundamental understanding of the internet and modern web services (e.g. http(s), DNS, RESTful APIs, Streaming)
Deep understanding of Networking Systems such as:

L2 ARP / spanning tree / LAG (link aggregation) L3 routing OSPF / BGP / multicast IPv6

L2 ARP / spanning tree / LAG (link aggregation)
L3 routing OSPF / BGP / multicast
IPv6
Experience in Monitoring

Snmp Zenoss or similar monitoring solution like icinga / nagios / prometheus

Snmp
Zenoss or similar monitoring solution like icinga / nagios / prometheus
Experience in working with standard Ops tools (such as Debian/APT, nginx, Linux, Puppet, Jenkins)
Comfortable working with remote colleagues, multidisciplinary teams and external partners
Fluent verbal and written English language skills
A BS/MS degree in computer science or similar discipline
Basic understanding of programming languages such as Python, Ruby and/or Go based web frameworks
Good understanding of Data stores (elasticsearch, cassandra, redis, memcached, mysql, database replication/failover)
Limited travel might be required