Data Platform Engineer
Description
We're seeking an exceptional Data Platform Engineer with deep expertise in Apache Airflow to build, scale, and maintain our data orchestration platform. This is a platform engineering role - you'll be building the infrastructure and tooling that enables other data engineers to orchestrate their workflows, not building data pipelines yourself.
Responsibilities
Airflow Platform Development
Design, architect, and maintain highly scalable Apache Airflow platform infrastructure
Deep dive into Airflow internals to customize and extend core functionality
Develop custom Airflow operators, sensors, hooks, and plugins for organizational use
Build internal frameworks and abstractions on top of Airflow to simplify DAG authoring
Modify Airflow source code when necessary to meet specific platform requirements
Create standardized patterns and reusable components for data teams
Contribute to Airflow open-source community
Infrastructure Scalability
Deploy and manage Airflow on Kubernetes at scale
Optimize Airflow performance for thousands of concurrent DAGs and tasks
Design and implement multi-tenancy and isolation strategies
Build auto-scaling capabilities for executor resources
Architect high-availability and disaster recovery solutions
Manage Airflow metadata database performance and scaling
Platform Reliability Operations
Implement comprehensive monitoring, alerting, and observability for the platform
Troubleshoot complex Airflow internals and distributed system issues
Build self-service capabilities and guardrails for platform users
Create tooling for platform debugging, profiling, and diagnostics
Establish SLAs and ensure platform reliability (uptime, latency)
Plan and execute zero-downtime upgrades and migrations
Integration Ecosystem
Build integrations with Spark, EMR, Kubernetes, and Hadoop ecosystem
Develop authentication and authorization frameworks (RBAC, SSO)
Integrate with CI/CD systems for DAG deployment automation
Connect Airflow with observability stack (metrics, logs, traces)
Build APIs and CLIs for platform management and automation
Developer Experience
Create documentation, best practices, and architectural guidelines
Build developer tooling (CLI tools, testing frameworks, DAG validators)
Provide technical consultation to data engineering teams
Conduct code reviews for platform-related contributions
Evangelize platform capabilities and gather user feedback
Minimum Qualifications
Deep Airflow Expertise
5+ years in Platform/Infrastructure Engineering or Data Platform Engineering
3+ years of deep, hands-on experience with Apache Airflow internals:
Understanding of Airflow architecture components (scheduler, executor, webserver, metadata DB)
Experience customizing and extending Airflow core (not just using it)
Knowledge of executor implementations
Understanding of Airflow's DAG parsing, scheduling, and execution model
Experience with Airflow plugin development and custom operators
Ability to read and modify Airflow source code
Infrastructure Platform Skills
Expert-level Python (advanced programming, not just scripting)
Strong Java proficiency for Spark/Hadoop integrations
Production experience with Kubernetes (deployments, operators, Helm)
Deep understanding of containerization (Docker, multi-stage builds)
Experience with AWS EMR cluster management and APIs
Knowledge of Hadoop ecosystem architecture (HDFS, YARN, resource managers)
Experience with Apache Spark architecture and cluster modes
Platform Engineering
Distributed systems concepts and design patterns
Database performance tuning (PostgreSQL/MySQL for Airflow metadata)
Message queuing systems
Infrastructure as Code (Terraform, CloudFormation, Pulumi)
CI/CD systems (Jenkins, GitLab CI, GitHub Actions)
Monitoring and observability (Prometheus, Grafana, ELK, Datadog)
Software Engineering
Strong software design principles and architectural patterns
Experience building frameworks, libraries, and developer tools
Test-driven development and comprehensive testing strategies
Version control and collaborative development practices
API design and development (REST, gRPC)
Performance profiling and optimization
Preferred Qualifications
Active contributions to Apache Airflow open-source project
Experience running Airflow at massive scale (1000+ DAGs, 100K+ daily tasks)
Experience building multi-tenant data platforms
Experience with GitOps and declarative infrastructure
Background in SRE or platform reliability engineering
Experience in digital advertising or high-scale data platforms