Highfive is a cloud-based service supporting tens of thousands of concurrent users in a mission-critical real-time communication application.  We are looking for someone to augment our software engineering team by seeding a site reliability engineering team that will grow alongside the service technology to eventually enable 99.99% global uptime and hundreds of thousands of real-time audio/video streams.  As Highfive succeeds, this role should transition to a director-level position.   What you’ll be doing:

  • Design, implement, productionize and maintain site reliability process and systems
  • Educate the platform software engineering team on reliability best practices, and work to institute changes in the software engineering process to accommodate reliability principles
  • Alongside software engineers, provide service outage escalation response
  • Recruit and manage an adaptable, high-velocity team

Desired Skills and Experience

Qualifications

  • Expertise in site reliability engineering in a multi-datacenter production cloud environment with demanding up-time, real-time performance, and security requirements
  • Experience adopting and employing open-source and commercial technology products, as well as writing our own where appropriate, in support of SRE mission
  • Ability to recruit and manage a high-performing team
  • Experience engaging and negotiating with equipment and service vendors and partners
  • Comfort working with senior management to allocate and prioritize engineering energy in support of the SRE mission, in a real-world environment constrained by limited resources and bandwidth
  • Experience with instituting, tracking, and being accountable for a global uptime metric Additional awesomeness

  • Experience building an SRE organization from scratch
  • Experience with physical datacenters and network service in a production context
  • Experience deploying infrastructure overseas, particularly in environments with unique challenges such as mainland China
  • Expertise in cloud network security