Grindr is a complex ecosystem of multiple technologies. The Site Reliability Engineer (SRE) is responsible for implementing automation solutions, as well as maintaining and improving the Grindr technical operations ecosystem. Solving challenges in distributed computing, high-performance computing and high-availability in runtime is a day-to-day theme. We are looking for a passionate technologist who enjoys complex problem solving.

Desired Skills and Experience

  • Deployment and support of the full lifecycle of applications in Amazon Web Services
  • Design, implement, document, and handle all aspects of Linux/CentOS/Debian/Ubuntu
  • Identify repetitive, manual tasks and automate them
  • Develop effective tooling, alerts, and response to both identify and address reliability risks
  • Participate in on-call rotation with other teams in the Performance and Reliability Teams (Pager Duty)
  • Engage with product engineering teams to triage production outages and carry forward action items to improve ongoing reliability
  • Evangelize cloud and devops-centric best practices to improve reliability and performance and cost-efficiency of our stack
  • Evaluate advanced bleeding-edge technologies for our use
  • Assist in after-hours deployments
  • Work with the Development team in building and maintaining activities related to Java runtime and MySQL environments
  • Write and maintain moderately complicated scripts in shell scripting (Bash, Python, Ruby, JavaScript, and/or Perl) in helping to automate and scale
  • BS degree in engineering or equivalent work experience
  • An understanding of high-traffic, large-scale distributed systems and the ability to perform root cause analysis on stability and performance related events in such environments
  • Familiarity with continuous integration and continuous deployment systems and the ability to describe pros, cons and pitfalls of the various solutions.
  • High familiarity with Git and version control systems
  • Experience with Linux systems; must understand how processes, users, groups, privileges and package managers work
  • Hands on experience in backup and restore tools.
  • Experience with automation and configuration management systems such as Puppet, Ansible, Salt, etc.
  • Competency with PostgreSQL, Cassandra, Redis, Amazon Redshift
  • Expert proficiency in UNIX scripting languages (Bash, Ruby, Python) and some experience with compiled languages (Go, Java, etc)
  • Experience with configuration and troubleshooting of Linux, Java, Tomcat, and other middleware technologies
  • Passion for resolving reliability issues and identify strategies to mitigate going forward
  • Experience with Cloud Computing platforms (particularly AWS) a plus
  • Strong Linux system-level analysis capabilities
  • Passion for clear communication, especially prioritizing concerns to align with team and business goals.
  • Deep network analysis experience
  • Experience with Terraform and Atlas
  • Thorough understanding of low level networking
  • Experience with ElasticSearch and MySQL Aurora
  • Full coverage medical and dental insurance
  • Unlimited sick policy
  • Competitive Salaries and Options
  • 401(k)
  • Catered breakfast, every day
  • Stocked kitchen
  • Free on-site parking
  • Casual dress environment