Senior Director - Reliability Operations
About the Role
The Senior Director - Reliability Operations, is a strategic leader accountable for ensuring the reliability, availability, and performance of the enterprise technology ecosystem. This role oversees all ITIL-based service management functions, Site Reliability Engineering (SRE), the ServiceNow Platform, Mission Control, and Live Sight Insights.This leader drives operational excellence through a proactive reliability strategy that combines process discipline, automation, observability, and real-time insights. They will partner closely with engineering, infrastructure, cybersecurity, and product teams to build and sustain resilient systems that power Gap Inc.’s digital and in-store experiences.
As a thought leader, the Sr. Director will shape the long-term vision for operational reliability and service management—defining modern capabilities, optimizing service performance, and establishing an innovation-driven reliability culture.
What You'll Do
Strategic Leadership & Vision
Define and execute the enterprise Reliability Operations strategy, ensuring alignment with business objectives and technology roadmaps.
Lead transformation of ITIL functions into agile, data-driven service management capabilities across incident, problem, change, and configuration management.
Partner with senior technology and business leaders to embed reliability and performance metrics into product development and operational planning.
Operational Excellence & Reliability Engineering
Lead Site Reliability Engineering (SRE) practices across platforms and services—driving automation, self-healing capabilities, and proactive monitoring to achieve measurable service resiliency improvements.
Establish standards for availability, latency, scalability, and operational efficiency through engineering-driven reliability principles.
Champion reliability by design—ensuring observability, capacity planning, and chaos testing are core to delivery processes.
Mission Control & LiveSight Insights
Oversee the Mission Control organization responsible for real-time system monitoring, incident command, and critical event management.
Drive adoption of Live Sight Insights to create predictive and actionable intelligence on service health and performance trends.
Enable enterprise visibility of key metrics through intuitive dashboards and business-impact-based alerting models.
ServiceNow Governance Ownership
Own the ServiceNow Platform governance strategy and roadmap, ensuring it enables ITIL process excellence, automation, and collaboration on cross-enterprise workflow integration.
Collaborate with product and engineering teams to provide industry best practices for ServiceNow’s capabilities including IT, HR, Security, and Enterprise Operations.
Lead a platform governance mindset—focusing on reliability, scalability, and ease of use.
People Leadership & Culture
Build, inspire, and develop a high-performing global Reliability Operations team that embodies accountability, collaboration, and innovation.
Foster a culture of data-driven decision making, continuous learning, and operational excellence.
Serve as a mentor and coach to emerging leaders—raising the organizational bar for reliability engineering and service leadership.
Cross-Functional Partnership
Work closely with Software Engineering, Infrastructure, Cybersecurity, and Business Technology teams to ensure reliability objectives are integrated end-to-end.
Partner with Enterprise Architecture and Program Management to align technology investments with reliability outcomes.
Act as a trusted advisor to executive leadership on reliability strategy, risk posture, and performance health of the enterprise environment.
Who You Are
Proven strategic leader with success driving operational transformation at scale in global, complex environments for more than 10 years.
Deep expertise in ITIL frameworks, SRE principles, ServiceNow platform administration and architecture, and modern observability practices.
Strong technical understanding across infrastructure, cloud operations, automation, and service management ecosystems.
Exceptional ability to influence at all levels—translating technical reliability concepts into business impact and strategic value.
Passionate about developing people and creating a culture of ownership, reliability, and continuous improvement.
Demonstrated track record of leading large, diverse teams and delivering measurable improvements in service reliability, performance, and user satisfaction.
A high performing leader—operating with strategic agility, executive presence, and the ability to build organizational alignment through clarity, accountability, and purpose.