Desired Skills and Experience

  • Service Levels and Adherence to them
  • Monitoring Strategies, Tools and Procedures
  • Triage Procedures (including enhancing exiting triage procedures)
  • Production support readiness documentation.
  • Actively manage relationship with key stakeholders, markets and resolver groups.
  • Respond to service-level issues and work to restore normal service operations as quickly as possible
  • Identify and lead the implementation of creative process and technology solutions within the team
  • Provide mentorship and team development opportunities
  • Assist in representing Production Support to the organization ensuring that high-availability and the ability to identify customer-facing issues is included in the development or deployment of new products and services.
  • Identify and recommends opportunities for “clean-slate” process improvement with regards to incident management, fault monitoring, triage procedures and issue escalation
  • Develop procedures for incident triage and management, metric and measure creation, management and administration of monitoring tools
  • Oversee the timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, and system monitoring and event log management
  • Work with architecture, development and engineering teams to identify root cause for recurring incidents and create an action plan for resolution.
  • Monitor systems and services for most efficient operation, identifying fault conditions as well as opportunities for further optimization
  • Maintain escalation and contact lists for mission critical systems and services