IND Senior Manager, Infrastructure
We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.
Role Summary
The IT Operations Manager is a people manager and operational leader responsible for maintaining the availability, stability, and recoverability of critical infrastructure and business services within a 24/7/365 Global Command Center. This role leads an operations team responsible for Incident, Problem, Change, and Release Management controls, real‑time event response, and service restoration. The Command Center operates in a highly automated, AI‑augmented model, leveraging observability platforms, ITSM workflows, agentic AI, and continuous improvement to reduce incident impact, accelerate recovery, and prevent recurrence.
Key Responsibilities
People Leadership
- Directly manage, coach, and develop a team of IT operations professionals.
- Own staffing plans, shift schedules, on‑call rotations, and follow‑the‑sun handoffs.
- Set performance expectations, conduct reviews, and drive skills development and operational maturity.
- Support hiring, onboarding, and retention in partnership with HR and onshore leadership.
Operational Command & Availability
- Provide leadership oversight for day‑to‑day Command Center operations, ensuring rapid detection, triage, escalation, and restoration of service disruptions.
- Act as an escalation leader for high‑severity incidents, ensuring calm execution and cross‑team coordination.
- Ensure consistent execution of standard operating procedures across all shifts.
Incident Management
- Own and enforce Incident Management controls including queue health, SLAs/OLAs, escalation paths, and communications.
- Ensure effective major incident execution, including bridge facilitation, stakeholder communications, and post‑incident reviews.
- Drive accountability and learning from incidents to reduce recurrence.
Change & Release Management
- Ensure operational controls for Change and Release execution, including readiness validation, release support, and early‑life monitoring.
- Oversee emergency changes during incidents, ensuring governance, documentation, and post‑implementation follow‑up.
- Partner with Change and Engineering teams to reduce change‑related incidents.
Continuous Improvement & AI‑Augmented Operations
- Ensure root cause analysis, corrective actions, and knowledge capture for significant incidents.
- Drive automation, runbook maturity, and improved alert quality.
- Lead teams operating with AI‑enabled and agentic automation capabilities, ensuring appropriate human oversight.
Metrics & Governance
- Own operational KPIs (availability, MTTR/MTSR, incident recurrence, change impact).
- Provide regular operational insights, risks, and improvement actions to onshore leadership.
Required Qualifications
- 8+ years of experience in IT Operations, NOC, Command Center, or Service Management roles.
- Prior people management experience in a shift‑based or 24/7 operations environment.
- Strong working knowledge of Incident, Change, and Release Management (ITIL‑aligned).
- Proven experience leading teams through high‑severity production incidents.
- Strong communication, coaching, and stakeholder‑management skills.
Preferred Qualifications
- Experience in automation‑heavy or AI‑augmented operations environments.
- Familiarity with enterprise ITSM platforms and observability tools.
- Demonstrated success driving operational improvements and reducing incident recurrence.
Measures of Success
- High service availability and reduced customer impact
- Improved MTTR/MTSR and incident recurrence rates
- Strong team performance, engagement, and operational discipline
Increased automation and effective use of AI in operations