Tech Lead - Software Engineer - Site Reliability Engineer
Due to the high-stakes nature of the financial technology domain (we deal with real money), fault tolerance and system reliability are paramount. The SRE team is at the center of the action, and we’re looking for someone to build and lead the team. The SRE function combines software engineering with knowledge of networking and systems to design and build software for our engineering teams. The software and systems produced by our SRE team enable our product and platform engineers to build and run our distributed, fault-tolerant, financial platform and infrastructure, which is central to our long term company mission. The primary goals of the Site Reliability Software Engineer are reliability, scalability, monitoring, alerting, and performance of the entire system. In this role you’ll be responsible for technical leadership over the SRE team. This entails everything from infrastructure and project planning—working with other engineering team leads to define their needs, SLOs, and SLAs, and translate them into software and systems— to project execution and deployment—project management, code reviews, and hands-on coding.
WHAT YOU’LL DO
- Lead SRE team projects from specification and planning, thru execution & coding with multiple engineers, to testing and deployment into production
- Work with your team to design, build, and deliver software that will enhance the scalability, availability, and efficiency of the Affirm platform and products—we use tools including Docker, AWS, Jenkins, New Relic and Rollbar, but you’ll be encouraged to build your own tools and/or leverage open source resources to create the optimal infrastructure
- Address complex software and systems issues such as distributed change propagation on live serving systems
- Work proactively across the company to ensure the Affirm infrastructure is never a constraint for the engineering team or any aspect of the company
- Design & review software, architecture, and methods for operating services and systems
- Participate in software and system performance analysis and tuning, service-capacity-planning and demand forecasting
- Analyze/debug performance issues across distributed services
- Plan, design, and build our infrastructure to scale with an increasing number of users, features, business requirements, partners, and new engineers
- Provide mentorship to more junior engineers in their progress on business and personal career goals
WHAT WE LOOK FOR
- Passion and drive to change consumer banking for the better
- Hands-on software development in a dynamically typed language—we primarily use Python, but members of our team have backgrounds in a wide range of technologies
- Experience building real-time distributed web services in the Consumer of SaaS space
- Ability and interesting in picking up new technologies quickly
- Strong software engineering fundamentals are far more important than familiarity with specific tools or languages
- Experience with system monitoring and alerting for availability and performance
- Strong expertise with Linux/Ubuntu internals
- Experience troubleshooting mission-critical services and software not written by you
- Understanding of network protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing
Desired Skills and Experience
See application page for details