Site Reliability Engineering Manager

Bengaluru February 25, 2026 Apple Custom Ats

Summary

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something. Apple's Artificial Intelligence and Data Platforms (AiDP) team is seeking an experienced Site Reliability Engineering (SRE) Manager to support scalable and resilient distributed systems that power Apple's data pipelines and analytics platforms. Our Enterprise Data Warehouse landscape caters to a wide variety of real-time, near real-time and batch analytical solutions. These solutions are an integral part of business functions like Sales, Operations, Finance, AppleCare, Marketing and Internet Services, enabling business drivers to make critical decisions. We utilizes proprietary and open source technologies such as Kafka, Spark, Iceberg, Airflow, and others to build these solutions. If you are passionate about addressing infrastructure challenges at scale, both on-premises and in the cloud, and focused on optimizing scalable solutions by prioritizing ease of use and maintenance, you will discover exciting opportunities in AiDP.

Description

As a hands-on SRE Manager, you’ll lead by example—actively driving operational excellence, contributing to code, and ensuring system reliability. You will be deeply involved in incident response across complex, distributed data platforms designed to support data exploration, analytics, and reporting solutions. These platforms operate at the unique intersection of high data volume and hybrid infrastructure, spanning both cloud and on-premise environments.

Minimum Qualifications

Hands-on experience supporting and maintaining applications in cloud or hybrid environments Expertise in cloud-native services, including ETL frameworks (Apache Spark, Flink), and messaging systems (Kafka) Strong knowledge of cloud infrastructure & services (e.g., AWS, GCP, Kubernetes), Observability tools (e.g: Prometheus, Grafana, CloudWatch) Programming experience in Python, Java, or Scala Proven ability to lead incident response, perform root cause analysis, and drive system reliability improvements Bachelor’s degree or equivalent, with 10+ years of experience in the SRE domain and at least 2 years in a management role focused on leading, hiring, developing and building teams

Preferred Qualifications

Hands-on experience supporting enterprise data systems on distributed architectures Exposure to data visualization tools such as Tableau, Business Objects, ThoughtSpot, with experience supporting and troubleshooting issues related to dashboards and reports Experience with modern & distributed databases such as Snowflake, Cassandra, SingleStore, and SAP HANA Experience using GenAI or automation tools for issue detection, alerting, or remediation Solid understanding of system design, data structures, and incident management best practices

Apply on company site

How to Get Hired at Apple

Apple's custom ATS requires extra attention to resume formatting and keyword optimization — don't assume standard ATS tricks will work identically
Tailor every application to the specific role and team — with many open positions across vastly different functions, generic applications are unlikely to succeed

Read the full guide

How well do you match this role?

Check My Resume