Position Details:This position falls within American Family Insurance’s DevOps organization within I/S. This position may also be referred to as a Site Reliability Engineer.The Application Reliability Engineer is responsible for ensuring application reliability, resiliency and performance. The position is accountable for facilitating communication, collaboration and integration between operations and software development. The Application Reliability Engineer ensures our customers get the best quality of service and uptime. The engineer will identify where to expect and how to tolerate failures from our systems as well as those we depend upon. The position will work closely with our developers and architects to build services and applications that maintain limited functionality even when portions of the application become inoperative. The engineer is responsible for ensuring the systems and applications we launch remain available, reliable and efficient at even as their duties scale and evolve.System Optimization (60%)

  • Develop and implement process/guidelines for continuous performance monitoring, tuning and capacity modeling.
  • Diagnose and resolve latent and systemic reliability issues by performing system level troubleshooting across entire stack: hardware, software, application and network.
  • Develop engineering solutions to failures and all other problems that adversely affect site reliability and uptime. Including capacity, performance, stability and security issues.
  • Develop tools and procedures to be able to manage demand on our systems when that demand is too high e.g. degrading services gracefully, user prioritization, removing low priority traffic, intelligent banners.
  • Work with our developers and architects to design and integrate systems that respond consistently to failures by gracefully degrading our services.
  • Enable and support the growth and scaling of products and services. Identifying inefficiencies in our current systems and planning for growth in those new and old.
  • Instigate planned and spontaneous “fire drills” to continually test our systems ability to deal with failures and identify weak points that need improving.Technical Leadership (40%)

  • Leads and/or participates in systems analysis, general systems design, specification development for vendor contracts, and detailed systems design as needed based on expertise.
  • Provides technical leadership in the ongoing adoption and development of software engineering procedures, standards and methods.
  • Collaborate and ensure effective communication between cross functional teams of Network and Security Architects, Software Developers and Engineers, Server and Platform Engineers, Testing, Support teams and Data Center Operations.
  • Collaborate with the engineering teams and lead the triage of high priority production incidents while bringing about changes to improve reliability.
  • Makes recommendations and presentations to senior management regarding technical issues, technical investments, and strategic directions.
  • Maintains up-to-date awareness of industry developments and best practices.
  • Mentor junior team members. Specialized Knowledge and Skills Requirements

  • Extensive knowledge and understanding of infrastructure technologies, operating systems, and the interconnectivity between infrastructure platforms and software tools.
  • Extensive knowledge and understanding of systems development life cycle (SDLC).
  • Demonstrated experience working with large scale distributed systems Engineering Management / Administration Hands-on experience (Hardware configuration, OS, Platform, Network and Data modeling).
  • Demonstrated experience analyzing and understanding complex software/systems.
  • Demonstrated experience developing different solution-delivery and design approaches and solutions to customers.
  • Demonstrated experience developing complex software/systems using one or more programming language.
  • Demonstrated experience managing and building fault tolerant Self-Healing systems.
  • Solid knowledge and understanding of networking, security and database concepts.
  • Solid knowledge and understanding of application architecture and design alternatives.
  • Solid Knowledge and understanding of integration and migration strategies and technologies. Position Details• Offer to selected candidate will be made contingent on the results of applicable background checks. • Relocation assistance is available. DICE, GDC-00,

Desired Skills and Experience

See application page for details