We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Principal Technical Program Manager

Microsoft
$163,000.00 - $296,400.00 / yr
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Feb 22, 2026
Overview

Microsoft Azure operates one of the world's largest and most complex cloud compute fleets. As a Principal Technical Program Manager (TPM) in Compute Fleet Infrastructure, you will lead crossfunctional initiatives that ensure nodelevel health, availability, and automated recovery across Azure's global fleet, directly supporting the reliability and stability of customer workloads at scale.

This role operates at the intersection of hardware, host operating system (OS), virtualization, control plane services, and data center operations. The mission is to transform lowlevel node health signals into predictable, automated, and scalable recovery outcomes, protecting customer workloads while continuously raising the reliability standards of the Azure platform.

You will own endtoend programs that span health signal definition, fleetwide detection, mitigation strategies, and recovery automation. This work involves close collaboration with engineering, hardware, site reliability engineering (SRE), and operations teams to drive coordinated execution and measurable improvements across the compute fleet.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond



Responsibilities
  • Own node health strategy across Azure compute fleets, including bare metal and virtualized environments.
    Define what "healthy" means at the node level, aligning hardware, firmware, host OS, and virtualization signals into a consistent fleet health model.Drive measurable improvements in node availability, repair success rates, and recovery times across regions and SKUs.
  • Automated Detection, Mitigation, and Recovery:Lead programs that detect unhealthy nodes early, prevent customer impact, and automate recovery actions (e.g., repair, reprovisioning, isolation, or migration).
    Partner with engineering teams to close gaps between signal detection and actionable remediation.
    Ensure recovery mechanisms scale safely across largescale, heterogeneous fleets.
  • CrossTeam Program Execution:Coordinate work across multiple organizations, including compute platform engineering, hardware systems, data center operations, and site reliability teams.
    Translate ambiguous reliability problems into clear program plans, milestones, and success metrics.
    Identify systemic issues and drive longterm fixes rather than repeated tactical mitigations.
  • Metrics, Insights, and Continuous Improvement:Define and track fleetlevel health KPIs (e.g., nodes in service, recovery success, timetorepair).Use data and postincident learnings to prioritize investments that reduce repeat failures.Represent node health and recovery readiness in executive and operational reviews.


Qualifications

Required Qualifications

  • Bachelor's Degree AND 8+ years experience in engineering, product/technical program management, data analysis, or product development
    • OR equivalent experience.
  • 6+ years of experience managing cross-functional and/or cross-team projects.

Other Requirements

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications
  • Bachelor's Degree AND 15+ years experience engineering, product/technical program management, data analysis, or product development
    • OR equivalent experience.
  • 10+ years of experience managing cross-functional and/or cross-team projects.
  • 1+ year(s) of experience reading and/or writing code (e.g., sample documentation, product demos).

#azurecorejobs

Technical Program Management IC6 - The typical base pay range for this role across the U.S. is USD $163,000 - $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 - $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Applied = 0

(web-54bd5f4dd9-cz9jf)