Desired Skills and Experience

  • Defines, owns and delivers process and tooling that supports reliability, including incident management and post-incident review
  • Guides and trains teams in these processes so they get maximium benefit with minimal friction
  • Acts as an advocate for these processes, running meetups, “War Games” sessions, and blogging & presenting internally & externally
  • Measures the business-relevant outcomes of the processes you own, including incident rate and time-to-recovery
  • Experience managing major incidents and post-incident reviews at comparable technology organisations
  • A track record of measurably improving reliability results across teams through your own initiatives
  • Experience working with other engineering leaders as a trusted reliability subject matter expert
  • Sufficient technical nous to understand complex, high-scale information systems
  • Examples of your data-driven approach to process measurement and improvement
  • Demonstrated program leadership and accountability generation abilities
  • Some ability to write code and implement automation that supports reliability
  • A positive and enthusiastic attitude
  • Project and program management experience, including goal-setting & measurement, and stakeholder management
  • Experience and desire to present your expertise to large groups, eg. at all-hands meetings and conferences
  • Experience developing and delivering training in a comparable organization to Atlassian
  • More extensive current or former experience in software development or release management
  • Formal training or qualification in incident management or post-incident reviews
  • Experience and skill in data analysis and reporting (eg. SQL, ETL systems, and data visualisation)
  • Program Management around incident management (IM) and post-incident reviews (PIRs)
  • Goal-setting and measurement
  • Accountability generation in a matrix organisation
  • Stakeholder management and communication - i.e. “People skills”
  • Ownership of IM & PIR process and tooling
  • Continuous process measurement and improvement
  • Internal advocacy for process excellence across many disparate teams
  • Development and delivery of IM/PIR training across the company
  • Developing supporting tooling and automation
  • Analysis and Reporting
  • Analysis across groups to draw valid conclusions about the drivers of reliability
  • Regular and ad-hoc reporting to key stakeholders
  • Domain model, batch job, and report creation and maintenance
  • Hands-on Incident Management and PIR
  • Manage major incidents in an on-call roster as part of our global major incident management team
  • Lead incident teams to resolve major incidents quickly and effectively
  • Drive post-incident reviews to turn failures into resilience

Apply