Site Reliability Engineer
With Higher Logic in Arlington VA USMore jobs from Higher Logic
Posted on March 23, 2020
About this job
Compensation: $80k - 100k
Job type: Full-time
Experience level: Junior, Mid-Level
Role: DevOps, System Administrator
Industry: Collaboration Tools, Enterprise Software
Company size: 201–500 people
Company type: Private
amazon-web-services, devops, docker, iis
This is a full-time position with the Engineering Team. The Site Reliability Engineer will work both collaboratively and independently on concurrent complex projects to deliver technical solutions, execute road maps and promote DevOps best practices within the organization. Success in this role depends on performing at a high degree of technical skill in a 24x7x365 global production environment, while maintaining a positive attitude, aim towards solutions, and good working relationships with their coworkers.
DevOps at Higher Logic has primary responsibility for service reliability in the production environment. We live at the juncture of Engineering, Operations, and Support, which means that we interact with large swathes of the company on a daily basis. The company frequently introduces new products, features and services; these changes require a flexible, thoughtful and forward-thinking approach to scalability and performance. As an SRE, you will have the opportunity to perform hands-on configuration and tuning of services while working to build out independent microservice architectures. Managing this environment requires a high level of individual knowledge and capabilities, coupled with optimism, focus, and close teamwork across the organization and the company.
Higher Logic operates on a large scale, serving tens of millions of end users every day. The entire technical stack is well on its way to being fully Cloud native. No matter how much you know, you will learn and grow here.
- Assisting with management and configuration of AWS cloud infrastructure components.
- Supporting the Engineering team’s efforts via configuration of and monitoring of real-time alerting systems.
- Helping to create strong feedback loops between all business lines in communicating, documenting and remediating operational incidents.
- Actively supporting security and compliance functions.
- Decrease incidence, scope and severity of operational failures (improve MTTR and MTBF).
- Guide products to Production Readiness (scalability, observability, operability, resiliency, etc.).
- Create, maintain and operate build and deployment automation and operations (CI/CD pipelines).
- Provide tier three on-call technical support.
- Familiarity with AWS services (EC2, S3, IAM).
- Understanding of IAC and the use of related tools such as: Terraform, Chef, Puppet, Ansible.
- Experience with SQL Server, IIS, HAProxy.
- Desire to improve product, technology, people and process.
- Appreciation of the value of diversity of opinions, approaches, and backgrounds.
- Excellent communications & collaboration skills.
- Understanding of the value provided by incremental solution delivery, POCs, MVPs, etc.
- Bachelor’s degree or better in Computer Science, MIS, or equivalent commercial experience.
- Windows Server and Linux SRE work over 2-year or longer period.
- Experience with Autoscaling, RDS, Aurora, Postgres, ECS, Docker, Fargate, Redis, Memcached, S3, SQS, SES, SNS, Secrets Manager, Lambda, CloudWatch, Active Directory & ADFS, CI/CD, containers at scale.
- Proficiency in at least one high level language such as: Python, Bash, PowerShell, C# preferred.
- Familiarity with Agile (Kanban and Scrum).