Site Reliability Engineer (SRE) at Grindr (West Hollywood, CA)

Grindr is a complex ecosystem of multiple technologies. The Site Reliability Engineer (SRE) is responsible for implementing automation solutions, as well as maintaining and improving the Grindr technical operations ecosystem. Solving challenges in distributed computing, high-performance computing and high-availability in runtime is a day-to-day theme. We are looking for a passionate technologist who enjoys complex problem solving.

Desired Skills and Experience

Deployment and support of the full lifecycle of applications in Amazon Web Services
Design, implement, document, and handle all aspects of Linux/CentOS/Debian/Ubuntu
Identify repetitive, manual tasks and automate them
Develop effective tooling, alerts, and response to both identify and address reliability risks
Participate in on-call rotation with other teams in the Performance and Reliability Teams (Pager Duty)
Engage with product engineering teams to triage production outages and carry forward action items to improve ongoing reliability
Evangelize cloud and devops-centric best practices to improve reliability and performance and cost-efficiency of our stack
Evaluate advanced bleeding-edge technologies for our use
Assist in after-hours deployments
Work with the Development team in building and maintaining activities related to Java runtime and MySQL environments
Write and maintain moderately complicated scripts in shell scripting (Bash, Python, Ruby, JavaScript, and/or Perl) in helping to automate and scale
BS degree in engineering or equivalent work experience
An understanding of high-traffic, large-scale distributed systems and the ability to perform root cause analysis on stability and performance related events in such environments
Familiarity with continuous integration and continuous deployment systems and the ability to describe pros, cons and pitfalls of the various solutions.
High familiarity with Git and version control systems
Experience with Linux systems; must understand how processes, users, groups, privileges and package managers work
Hands on experience in backup and restore tools.
Experience with automation and configuration management systems such as Puppet, Ansible, Salt, etc.
Competency with PostgreSQL, Cassandra, Redis, Amazon Redshift
Expert proficiency in UNIX scripting languages (Bash, Ruby, Python) and some experience with compiled languages (Go, Java, etc)
Experience with configuration and troubleshooting of Linux, Java, Tomcat, and other middleware technologies
Passion for resolving reliability issues and identify strategies to mitigate going forward
Experience with Cloud Computing platforms (particularly AWS) a plus
Strong Linux system-level analysis capabilities
Passion for clear communication, especially prioritizing concerns to align with team and business goals.
Deep network analysis experience
Experience with Terraform and Atlas
Thorough understanding of low level networking
Experience with ElasticSearch and MySQL Aurora
Full coverage medical and dental insurance
Unlimited sick policy
Competitive Salaries and Options
401(k)
Catered breakfast, every day
Stocked kitchen
Free on-site parking
Casual dress environment