Principal Software Engineer
In order to delight customers in a cloud first world, Microsoft will continue to provide highly available services with exceptional quality and new features and functionality lighting up on a regular basis. Availability, latency, and performance are all things that our customers demand for cloud services and are typically differentiators when they are making their decision on which cloud provider to leverage. As the arms race heats up between Azure and out competitors platform reliability becomes THE competitive advantage. As Microsoft continues to make a deeper investment in Cloud First with Azure and with an expanding customer base who rely on our services to run their diverse businesses, quality of service is paramount. Our customers demand it and our competitors are striving for it. Speed and agility enable quality so anything that can be done to ease the path to get features out is critical. We must knock down operational overhead by designing, engineering, and automating our way out of repeatable motions.Enter the Azure Site Reliability Engineering (SRE) team. What is SRE? SRE is what you get when you treat operations as if it’s a software problem. Our mission is to progress, protect, and provide for the software and systems behind all of Azure platforms and services * Storage, DNS, Networking, Compute, Service Bus, Event Hub, IOT, etc. * with an ever-watchful eye on their availability, latency, performance, and capacity. SREs will be responsible for bulletproofing, reinforcing, ruggedizing and generally improving the quality of service and innovation throughput of the services they are accountable for. The team will be staffed with software engineers as well as system and network engineers that have an affinity for quality of service, improving operability, and providing high availability. The team will own their services in production, and drive reliability and performance across massive scale by mastering the full depth and breadth of the stack.As a new team forming up in Azure, the SRE team has the opportunity to define the responsibilities, accountabilities, success metrics, methodologies, and operating procedures for SRE and provide the thought leadership and execution of improved site reliability at Microsoft. As an engineer on the SRE team you will have full access to the technology stack and be responsible for hardening, scaling, monitoring, and ensuring world class uptime.Responsibilities:Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Azure’s services.Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.Influence and create new designs, architectures, standards and methods for large-scale distributed systems.Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.Conduct periodic on call duties using a follow-the-sun model (on an as needed basis).Ability to build and drive consensus towards common goals and priorities through advanced impact and influence skills.Qualifications desired:10+ years development experience5+ years distributed systems experienceStrong engineering background, experience as an engineering leader and a programmer. Experience shipping, operating, improving large scale distributed systems.Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning.Ability to build and influence broadly towards common goals and priorities through advanced impact and influence skills.Firm sense of accountability, ownership for end-to-end project lifecycle with solid project management and communication skills.Expertise in problem solving and analyzing global scale distributed systems and critical production service environments.Capable of technical deep-dives into networking, service design, operating systems and storage, yet verbally and cognitively agile enough to hold your own in a strategy discussion with Azure’s leadership team.Understand the overall system architecture, clients, features and service dependencies.Experience with defining and measuring internal/customer facing SLA’s.Statistics experience and bias for measurement and driving action with metrics.CS Degree or equivalent experience.AZSREAZ16
Desired Skills and Experience
See application page for details