Data Engineer Hub

Data Engineering at Airbnb in 2026: Airflow's Birthplace

In short

Data engineering at Airbnb sits at the origin point of two of the industry's most-used open-source tools. Maxime Beauchemin built Apache Airflow there in 2014 and later open-sourced Apache Superset from the same team. The platform now spans Airflow for orchestration, Superset for self-serve BI, the Minerva metrics framework for consistent definitions, and Spark on AWS EMR and EKS for batch and streaming. DEs at Airbnb work close to the open-source layer they inherited and own metric correctness across Hosts, Guests, and Trust products.

Key takeaways

  • Apache Airflow originated at Airbnb in 2014, created by Maxime Beauchemin to orchestrate the company's growing data workflows.
  • Apache Superset also began at Airbnb as a self-serve data exploration and visualization layer on top of the warehouse.
  • Minerva is Airbnb's internal metrics framework that enforces a single definition for KPIs across teams, dashboards, and experiments.
  • The core stack in 2026 is Airflow + Spark on AWS EMR and EKS, with Superset and Tableau on top and Iceberg-style table formats below.
  • Interviews emphasize SQL depth, distributed-systems reasoning, and a dimensional-modeling round grounded in Airbnb-style marketplace data.
  • Total comp for senior DEs in the Bay Area generally lands in the $280K-$420K range per Levels.fyi, with L5+ pushing higher on equity.
  • Open-source contribution is part of the culture: many DEs ship code to Airflow and Superset as part of their day job.

DE at Airbnb in 2026: open-source heritage

Airbnb is one of the rare companies where the data engineering team's day-to-day tools were born inside the team. Apache Airflow was started by Maxime Beauchemin at Airbnb in October 2014 to replace a tangle of cron jobs and ad-hoc scripts; the original engineering blog post introducing it framed Airflow as "a workflow management platform" built around DAGs, operators, and a scheduler. Apache Superset followed in 2015 as an internal data-exploration tool that grew into one of the most-starred BI projects on GitHub and graduated to an Apache top-level project in 2021.

That heritage shapes hiring. DEs at Airbnb are expected to be fluent in Airflow internals, comfortable reading Spark physical plans, and opinionated about metric design. The team treats metric correctness as a product surface, not a back-office concern, because the same numbers feed Host payouts, search ranking experiments, and Trust & Safety models. New DEs typically rotate through a domain team (Homes, Experiences, Payments, Trust) and inherit ownership of a slice of the Minerva metric catalog.

The team also carries a quiet expectation that you will leave the platform better than you found it. Many internal libraries eventually ship as open source, and the engineering blog is treated as a real publication channel rather than a marketing surface; posts on Minerva, Bighead, and the warehouse architecture have become reference reading across the industry. For DEs evaluating offers, this is the strongest single differentiator vs. peer companies: the work compounds into public artifacts you can point to for the rest of your career.

Interview process

The DE loop at Airbnb in 2026 is five to six rounds after the recruiter screen, run on Zoom for most candidates with optional onsite for finalists in San Francisco. The rounds are recognizable to anyone who has interviewed at FAANG-tier data orgs but with a marketplace flavor.

  • Recruiter screen (30 min): background, level calibration, comp expectations.
  • Technical phone screen (60 min): two SQL problems on a marketplace schema (bookings, listings, reviews) and one short Python or Scala data manipulation question.
  • SQL deep-dive (60 min): window functions, cohort retention, anti-join debugging on a Hosts & Guests fact table.
  • Data modeling (60 min): design a warehouse model for a new product surface (e.g. Experiences pricing). Star vs. snowflake, slowly changing dimensions, grain choices, and how the model would land in Minerva are all fair game.
  • Pipeline / systems design (60 min): design an Airflow + Spark pipeline end-to-end, including backfill strategy, idempotency, partitioning, and alerting.
  • Behavioral / values (45-60 min): Airbnb's core values, conflict, ambiguity, and a deep-dive on a project the candidate owned.

Strong signal in the modeling and pipeline rounds is what differentiates IC4 offers from IC5+; staff candidates typically also get a cross-functional partnership round with a senior PM or DS lead. Interviewers grade explicitly for marketplace intuition: the same listing can be a Host-side fact, a Guest-side fact, and a Trust signal, and good DE candidates name those tensions instead of collapsing them into one table.

Tactically: be ready to talk through backfills without a whiteboard hand-wave (how do you reprocess thirty days of bookings without doubling Host payouts?), and have a position on slowly changing dimensions for listings, where pricing, availability, and amenities change daily. Candidates who only ever touched batch warehouses sometimes stumble on the streaming-adjacent questions; brushing up on event-time vs. processing-time before the loop pays off.

Compensation by level

Airbnb levels DEs on the same IC ladder as software engineers: IC3 (mid), IC4 (senior), IC5 (staff), IC6 (senior staff). Per Levels.fyi data for the Data Engineer role in 2026, total compensation ranges roughly as follows for Bay Area candidates. Numbers below are directional and shift with stock price and offer cycle.

  • IC3 (mid): ~$210K-$260K total (base around $165K-$185K, equity vesting over four years, target bonus ~10%).
  • IC4 (senior): ~$280K-$360K total, the most common offer band for experienced DEs.
  • IC5 (staff): ~$360K-$480K+ total, with equity becoming the dominant component.
  • IC6 (senior staff): $500K+ total, heavily weighted toward RSUs and refreshers.

Airbnb publishes a standard benefits package (travel credit, health, 401(k) match, sabbatical at tenure milestones) and runs comp adjustments tied to Radford and internal benchmarks. Refreshers at performance reviews are meaningful and often close the gap between offer and steady-state TC.

A few practical notes. First, sign-on bonuses are negotiable and often used to bridge unvested equity from a current employer; bring a clear summary of what you are walking away from. Second, the equity refresh stack matters more than the headline grant: a strong IC4 hire who performs well typically sees their TC drift up over years two and three rather than down. Third, Airbnb's stock price has historically been volatile, so candidates who care about predictability should weight base and bonus more heavily in their negotiation.

Tech stack: Airflow + Superset + Minerva + Spark on AWS EMR/EKS

The 2026 stack is best understood as four layers, each grounded in tools either built at or heavily shaped by Airbnb.

Orchestration: Apache Airflow runs the bulk of batch DAGs, with custom operators wrapping internal services. Beauchemin's original design decisions, DAGs as code, parameterized operators, a scheduler decoupled from execution, are still load-bearing a decade later.

Compute: Spark is the workhorse. Most jobs run on AWS EMR for long-running pipelines, with EKS-based Spark adopted progressively for elastic interactive workloads. Iceberg-style table formats sit under most warehouse tables to support time travel and schema evolution.

Metrics & semantics: Minerva is Airbnb's internal metrics framework. Instead of letting every dashboard re-implement "nights booked" or "active hosts," Minerva centralizes definitions, dimensions, and aggregation logic; downstream Superset and Tableau dashboards consume Minerva-resolved tables, which keeps experiment dashboards and finance reports in sync.

Consumption: Apache Superset is the default self-serve BI tool for engineers and PMs; Tableau is used for finance and exec reporting. Jupyter and Hue cover ad-hoc analysis. ML feature pipelines increasingly land on Bighead-derived feature stores wired through Airflow, and experiment readouts are served through ERF, Airbnb's internal experimentation platform, which itself reads Minerva-resolved metric tables to keep readouts and dashboards aligned.

Storage and ingestion: the warehouse is S3-backed with Hive-compatible metadata, increasingly migrated to Iceberg for ACID-style table operations. Ingestion is a mix of Kafka-driven streams (Airpal-era click events, booking events, search logs) and batch extracts from operational Postgres and MySQL. CDC pipelines move row-level changes into the warehouse on configurable cadences, and partition discipline is enforced through code review rather than left to convention. The result is a stack where the same DE can own a metric end-to-end: from Kafka topic, through Spark transformation, through Minerva definition, into the Superset chart a PM looks at every Monday morning. That vertical ownership is rare in the industry and is one of the strongest reasons mid-career DEs cite for joining and staying.

For DE candidates, the practical implication is that you will be asked, in interviews and on the job, to reason about metric definitions, partitioning strategies on Spark, and Airflow DAG idempotency at the same level of rigor as application engineers reason about service design.

One more piece worth understanding: the warehouse is treated as a versioned product. Schema changes go through review, breaking changes are coordinated, and the team invests in lineage tooling so a downstream Superset chart can be traced back through Minerva, through Airflow DAGs, to the source ingestion job. Candidates who have only worked in environments where the warehouse was a junk drawer often have to recalibrate. The upside is that once you ship something at Airbnb, it tends to stay shipped.

Frequently asked questions

Did Apache Airflow really start at Airbnb?
Yes. Maxime Beauchemin created Airflow at Airbnb in October 2014 to manage the company's batch workflows, and the team open-sourced it the following year. It was incubated at the Apache Software Foundation in 2016 and graduated to a top-level project in 2019.
Is Apache Superset also from Airbnb?
Yes. Superset was started by Beauchemin and the data team at Airbnb as an internal data-exploration tool, then open-sourced. It is now an Apache top-level project and one of the most popular self-serve BI tools in the ecosystem.
What is Minerva?
Minerva is Airbnb's internal metrics framework. It centralizes the definition of business metrics so every team, dashboard, and experiment uses the same logic. DEs at Airbnb are typically responsible for modeling source data into Minerva-compatible inputs.
What languages should I know for an Airbnb DE interview?
Strong SQL is non-negotiable. Python is expected for Airflow DAGs and data manipulation. Scala or PySpark fluency is a plus for the pipeline-design round. Java appears occasionally for legacy services.
Do Airbnb DEs contribute to open source?
Many do. Because Airflow and Superset originated at Airbnb, contributing fixes and features upstream is part of the culture and is recognized at performance reviews when the work has clear business impact.
How long is the Airbnb DE interview loop?
Typically two to four weeks from recruiter screen to offer. The full loop is five to six rounds after the screen: phone screen, SQL deep-dive, modeling, pipeline design, and behavioral, with a possible cross-functional round for staff candidates.
What is the salary range for a senior data engineer at Airbnb?
Per Levels.fyi data, IC4 senior DEs in the Bay Area generally see total compensation in the $280K-$360K range, with base around $185K-$215K and the rest in RSUs and bonus. Numbers move with stock price and offer cycle.
Is Airbnb DE work remote-friendly in 2026?
Airbnb operates a Live and Work Anywhere policy that allows employees to work remotely within their country of employment, with periodic team gatherings. Most DE roles are eligible, though some senior roles favor Bay Area presence for partnership reasons.

Sources


About the author. Blake Crosley founded ResumeGeni and writes about data engineering, hiring technology, and ATS optimization. More writing at blakecrosley.com.