Site Reliability Engineer
With OpenSlate in New York NY USMore jobs from OpenSlate
Posted on January 09, 2020
About this job
Compensation: $170k - 200k | Equity
Job type: Full-time
Experience level: Senior
Role: DevOps, System Administrator
kubernetes, docker, terraform, python, linux
OpenSlate is a global measurement and analytics company focused on digital video. We provide advertisers with proactive, brand-suitable targeting solutions encompassing the entire campaign lifecycle — from pre-planning, to live-campaign monitoring, to post-campaign analytics — keeping them covered at all times, not just discovering inappropriate content exposure after-the-fact. Our technology, data expertise, and proprietary analytics were developed specifically to support the needs of the world’s largest digital video platforms — Facebook and YouTube — and are extendible to all video content. Our platform is used by every major advertising holding company, as well as the world’s largest advertisers.
OpenSlate is headquartered in New York City with offices in Los Angeles, Chicago, London, and Sydney. Our team is comprised of hard-working, passionate, data nerds that spend their days solving problems at the intersection of data science and digital video.
Responsibilities & Essential Functions
The Site Reliability Engineer at OpenSlate is a hands-on technical position based out of NYC. You will collaborate with technical leadership, development, system engineers and are responsible for the design, implementation and deployment of highly available systems in AWS. Core to the position are strong, hands-on experience with Kubernetes & Docker, solid experience architecting scalable systems in AWS, a drive to automate processes with tools like Jenkins, log analysis, metrics, and monitoring.
The ability to communicate and collaborate in a team environment is a must
Hands-on experience with Linux, Kubernetes and Docker
A desire to automate; Jenkins is a plus
Metrics collection and Data-based decision making
Experience building visibility into system health with logging, metrics, and alerting
A methodology for root cause analysis, avoiding reboot and crossing fingers
Experience managing AWS infrastructure
VPCs and Availability Zones
Version control; git is a plus
Configuration management; Ansible is a plus
A deep understanding of networking (IP, subnets, routing, etc) and web protocols (HTTP, TLS, DNS, etc)
Ideal candidates will also have
Experience with Python
Understanding of distributed processing (e.g. Spark)
Understanding of distributed data stores like Solr, ElasticSearch, etc.
Server provisioning with Terraform or equivalent
Database management; PostgreSQL is a plus
Understanding of Linux based web stacks is a plus: e.g. Django, DRF, Celery, RMQ, Redis
Understanding of security best practices (SSL/TLS, firewalls, Security Groups, dynamic credentialing)