Senior Data Engineer
The Data Engineering team builds and operates the analytical data platform that powers machine learning, data science, analytics, and reporting across Veriff. We are responsible for large-scale data ingestion, platform reliability, and enterprise data governance — ensuring Veriffians have access to accurate and timely data.
In this role, you will own and evolve our data lake and data warehouse infrastructure, driving platform-level data management, governance, and reliability at scale.
You'll help us protect honest people online by:
- Owning and evolving our data lake and data warehouse infrastructure using technologies such as Spark, Apache Iceberg, S3, Trino/Athena, and Redshift.
- Designing and maintaining platform-level data transformation pipelines in Python and SQL — focused on schema evolution, partitioning, compaction, and deduplication.
- Implementing optimized storage formats (Parquet, Avro, ORC), partitioning strategies, and indexing to improve query performance and reduce platform costs.
- Driving data governance initiatives — PII detection and classification, access control policies, data cataloging, lineage tracking, and data quality frameworks.
- Ensuring the availability, reliability, and cost efficiency of the data platform, including observability, monitoring, and alerting for pipeline and query engine health.
- Collaborating with ML, analytics, product, and engineering teams to define data contracts, maintain schema consistency, and provide clean, well-governed datasets.
- Contributing to disaster recovery strategy and multi-region reliability of the data platform.
You are the right future Veriffian for the job if you have:
- Strong experience with Python, SQL, and Apache Spark / PySpark for large-scale data processing.
- Deep knowledge of modern analytics platform architecture — object stores, columnar and row-based data formats (Parquet, Avro, ORC), orchestration tools, analytical query engines, schema registries, and data catalogs.
- Experience with data governance and data management at scale — PII handling, data cataloging, schema management, access control, and data quality frameworks.
- Experience designing and operating data lake and data warehouse infrastructure.
- Solid understanding of storage optimization — partitioning, compaction, and compression trade-offs.
- Experience building observability, monitoring, and alerting for data platforms.
- Strong problem-solving skills and comfort working with ambiguity — defining problems before solving them.
- A collaborative mindset — this role serves ML, analytics, and product teams as internal customers.
You're an especially awesome match if you have:
- Experience with Infrastructure as Code (IaC) and Terraform.
- Familiarity with containerization — Docker and Kubernetes.
- Experience with CI/CD pipelines for data platform deployments.
- Knowledge of data lake table formats beyond Iceberg (Delta Lake, Hudi).
- Familiarity with data catalog and metadata management tools (e.g., DataHub, Amundsen, AWS Glue Catalog).
- Understanding of data privacy regulations (GDPR) in the context of data engineering.
- Experience building streaming data pipelines.
- Experience with the AWS data stack.
- Flexibility to work from home
- Stock options that ensure your share in our success
- Extra recharge days on top of your annual vacation
- Comprehensive relocation support to Estonia or Spain
- Extensive medical, dental, and vision insurance to ensure you’re feeling great physically and mentally
- Learning and Development & Health and Sports budget that you are free to tailor to your own needs
- Four weeks of fully paid sabbatical leave after reaching your 5th work anniversary