Site Reliability Engineer

With OpenSlate in New York NY US

More jobs from OpenSlate

Posted on January 09, 2020

About this job

Compensation: $170k - 200k | Equity
Job type: Full-time
Experience level: Senior
Role: DevOps, System Administrator

Technologies

kubernetes, docker, terraform, python, linux

Job description

OpenSlate is a global measurement and analytics company focused on digital video. We provide advertisers with proactive, brand-suitable targeting solutions encompassing the entire campaign lifecycle — from pre-planning, to live-campaign monitoring, to post-campaign analytics — keeping them covered at all times, not just discovering inappropriate content exposure after-the-fact. Our technology, data expertise, and proprietary analytics were developed specifically to support the needs of the world’s largest digital video platforms — Facebook and YouTube — and are extendible to all video content. Our platform is used by every major advertising holding company, as well as the world’s largest advertisers.

OpenSlate is headquartered in New York City with offices in Los Angeles, Chicago, London, and Sydney. Our team is comprised of hard-working, passionate, data nerds that spend their days solving problems at the intersection of data science and digital video.

Responsibilities & Essential Functions

The Site Reliability Engineer at OpenSlate is a hands-on technical position based out of NYC.  You will collaborate with technical leadership, development, system engineers and are responsible for the design, implementation and deployment of highly available systems in AWS.  Core to the position are strong, hands-on experience with Kubernetes & Docker, solid experience architecting scalable systems in AWS, a drive to automate processes with tools like Jenkins, log analysis, metrics, and monitoring.

Requirements

The ability to communicate and collaborate in a team environment is a must

Hands-on experience with Linux, Kubernetes and Docker

A desire to automate; Jenkins is a plus 

Metrics collection and Data-based decision making 

Experience building visibility into system health with logging, metrics, and alerting

A methodology for root cause analysis, avoiding reboot and crossing fingers

Experience managing AWS infrastructure

Auto Scaling

VPCs and Availability Zones

Security Groups

Cost forecasting

Version control; git is a plus

Configuration management; Ansible is a plus

A deep understanding of networking (IP, subnets, routing, etc) and web protocols (HTTP, TLS, DNS, etc)

Ideal candidates will also have

Experience with Python

Understanding of distributed processing (e.g. Spark)

Understanding of distributed data stores like Solr, ElasticSearch, etc.

Server provisioning with Terraform or equivalent 

Database management; PostgreSQL is a plus

Understanding of Linux based web stacks is a plus: e.g. Django, DRF, Celery, RMQ, Redis

Understanding of security best practices (SSL/TLS, firewalls, Security Groups, dynamic credentialing)

Apply here