Site Reliability Engineer / DevOps

Be the first of your friends to declare, “I love where I work!” and actually mean it. Laugh hard and work hard with some of the best and brightest in the tech industry.

GrubHub Holdings Inc. is the nation’s leading online and mobile food-ordering company dedicated to connecting hungry diners with local takeout restaurants. The GrubHub Holdings Inc. portfolio of brands includes GrubHub, Seamless, MenuPages and Allmenus. The company’s online and mobile ordering platforms allow diners to order directly from thousands of takeout restaurants across the country and London, and every order is supported by the company’s 24/7 customer service. GrubHub Holdings Inc. has offices in Chicago, New York City and London.

With a career at GrubHub Holdings Inc., you can order your cake and eat it, too!

About The Job

Grubhub engineers own and run their products and services from conception to continuous operation. DevOps engineers play a key role and are embedded within teams to focus on the operational aspects of our services.

Responsibilities

Create, maintain, own and operate your team’s services that supporting fundamental capabilities within Grubhub’s products.
Tackle some of the most challenging problems you can face developing high availability services in a distributed cloud environment that needs to scale exponentially.
Help evaluate and choose emerging technologies…new service protocols and architectures, self-healing capabilities, globally distributed caching, performance and code quality tooling, etc. Determine the right tool for the right task.

TOOLS WE WORK WITH

Java for micro services
Cassandra
Docker (in production!)
Mesos and Marathon for job scheduling
Combination of AWS and our own hardware
Python and Fabric for automation and our CD pipeline
Jenkins for builds and task execution
Linux (CentOS and Ubuntu)
DataDog for metrics and alerting
Puppet

Requirements

Minimum 4+ years experience building complex distributed systems. In this role you are the one gravitating toward operational concerns of the team, focusing on reliability, performance, capacity planning and automation of everything.
Proficient in high level script languages such as Python or Ruby (Python preferred)
Experience developing solutions leveraging Docker
Experience managing Linux (Centos, Ubuntu) systems
Configuration management experience with Puppet, Chef, or Ansible
Building/implementing monitoring for network, server and application status
Experience with monitoring tools such as graphite, nagios, Datadog, Runscope
Experience with log aggregation systems using splunk, logstash, loggly, elasticsearch
Continuous integration, testing, and deployment using git, jenkins
Experience with relational databases (MySQL)
Experience with NoSQL databases (Cassandra, Couchbase, Mongo)
Experience with Hadoop (Cloudera, DataStax), mahout and other big data platforms
Exceptional communication and troubleshooting skills.

Desired Skills and Experience

See application page for details