Gcp Data Engineer
Role & responsibilities
SQL (advanced joins, window functions, optimization)
• Google Cloud Platform (GCP)
• BigQuery (partitioning, clustering, cost optimization)
• Python
• Dataproc (PySpark / Spark)
• Cloud Storage (GCS)
• Cloud Composer (Airflow)
Additional GCP Services:
• Pub/Sub
• Cloud Functions / Cloud Run
• Dataflow (Apache Beam)
DevOps / Tools:
• Git / GitHub
• CI/CD pipelines
• Linux / Shell scripting
Responsibilities:
• Develop Python and PySpark-based ETL/ELT pipelines
• Build and optimize Dataproc Spark jobs
• Optimize BigQuery SQL and manage costs
• Orchestrate workflows using Cloud Composer
• Implement data validation, monitoring, and error handling
• Handle incremental and batch data processing
• Perform Hadoop to GCP migration
• Metadata management and performance tuning
Preferred candidate profile