Do I need both dbt tests and a data observability tool?

Yes, at meaningful scale. They solve different problems. dbt tests catch the assertions you can articulate (uniqueness, referential integrity, accepted values). Observability tools catch the failures you cannot anticipate — a 30% volume drop you never thought to test, a schema change in an upstream Salesforce field, a distribution shift after a marketing experiment. At pre-series-B, dbt tests plus a few cron-based freshness checks are usually enough; past series-B with thousands of tables, an observability tool earns its keep.

Where should data quality tests live in the pipeline?

On every layer that has a contract with downstream consumers. Bronze should test for ingest completeness and source-system fidelity. Silver should test for deduplication, conformance, and referential integrity across sources. Gold should test for the business-meaningful invariants — revenue equals sum of line items, no negative quantities, every order has a valid customer. Catching a violation in Bronze is 100x cheaper than catching it in a CEO's dashboard.

How do I get application engineers to care about data contracts?

Three things shift the conversation. (1) Make the cost visible: track data incidents caused by upstream changes and report them in the same forum as application incidents. (2) Make contracts low-friction: a YAML file in their repo with a CI check is tolerable; a 30-minute meeting per schema change is not. (3) Tie it to outcomes they care about — ML model regressions, exec dashboards going dark, regulatory reports needing rework. Contract adoption is a culture change, and the data team has to do the political work to make it stick.

Is Great Expectations still relevant in 2026?

Yes, but narrower than its 2020 peak. dbt tests absorbed most in-warehouse assertion use cases; observability tools absorbed monitoring. Great Expectations remains the right pick for batch validation outside the warehouse — validating vendor CSVs before load, asserting Pandas DataFrames in ML pipelines, certifying data products across team boundaries. A library, not a platform; shines when expectation suites must travel with the data.

What is the right alerting threshold for data quality issues?

Two-tier severity, mapped to who gets paged. Errors block the build and page in business hours: PK violations, freshness SLA breaches on revenue-critical tables, schema incompatibilities. Warnings go to Slack and get reviewed weekly: distribution drift, soft volume anomalies, optional foreign-key warns. The fastest way to destroy a quality program is to page on warnings — alert fatigue makes the team ignore real signals.

How do lineage tools compare to just reading the dbt DAG?

The dbt DAG covers the warehouse but stops at the source and the dashboard. A real lineage tool extends upstream (application databases, Kafka topics, SaaS sources) and downstream (BI tools, reverse-ETL syncs, ML feature stores). Small team with one warehouse: dbt docs is enough. 20+ source systems with multiple BI tools and reverse-ETL: a dedicated lineage tool is where end-to-end impact analysis lives.

What's the Pinterest Data Quality Canvas?

A framework Pinterest's data engineering team published for structuring quality work across producers, consumers, and platform teams. It maps quality dimensions (completeness, accuracy, consistency, timeliness, validity, uniqueness) against responsibility owners and surfaces the gaps where nobody owns the failure mode — the canonical template for treating quality as a shared discipline.

Data Engineer Hub

Data Quality and Observability

By Blake Crosley · Last verified 2026-04-30

In short

Data quality decides whether your warehouse is a decision-making asset or a liability dashboard. Barr Moses (Monte Carlo's co-founder) coined 'data downtime' to make the analogy explicit: the hours or days when data is missing, late, or wrong — and the cascading damage to dashboards, ML features, and executive trust. Modern data quality has three layers. In-pipeline assertions: dbt tests that fail the build when an invariant breaks. Observability: passive monitoring of freshness, volume, schema, and distribution to catch failures dbt cannot anticipate. Contracts: schemas and SLAs negotiated with upstream

Key takeaways

dbt tests on every primary key, foreign key, and accepted-values column are the non-negotiable baseline. Run them on Bronze and Silver, not just Gold.
Data observability (Monte Carlo, Soda, Anomalo, Bigeye) catches the freshness, volume, schema, and distribution failures that dbt tests cannot anticipate.
Barr Moses's 'data downtime' framing made data quality a measurable business metric — minutes of staleness or wrongness, with the same severity as application downtime.
Lineage tools (Atlan, OpenMetadata, DataHub) answer 'what breaks if I change this column?' in minutes rather than days. Column-level lineage beats table-level for impact analysis.
Data contracts are the 2026 senior-DE bar: schemas, freshness SLAs, and ownership negotiated with upstream producers, validated at the gateway, versioned like APIs.
Pinterest's Data Quality Canvas and the Convoy/GoCardless contract case studies show 70-90% incident reductions when quality moves from downstream cleanup to upstream contract enforcement.
The wrong architecture: zero in-pipeline tests plus an expensive observability tool. The right architecture: dbt tests as assertions, observability for the unknown unknowns, contracts on the interfaces that matter.

dbt tests as the baseline data-quality bar

Data observability beyond dbt: Monte Carlo, Soda, Anomalo

Data lineage tools and why they matter

Data contracts: the senior-DE bar in 2026

Frequently asked questions

Do I need both dbt tests and a data observability tool?: Yes, at meaningful scale. They solve different problems. dbt tests catch the assertions you can articulate (uniqueness, referential integrity, accepted values). Observability tools catch the failures you cannot anticipate — a 30% volume drop you never thought to test, a schema change in an upstream Salesforce field, a distribution shift after a marketing experiment. At pre-series-B, dbt tests plus a few cron-based freshness checks are usually enough; past series-B with thousands of tables, an observability tool earns its keep.
Where should data quality tests live in the pipeline?: On every layer that has a contract with downstream consumers. Bronze should test for ingest completeness and source-system fidelity. Silver should test for deduplication, conformance, and referential integrity across sources. Gold should test for the business-meaningful invariants — revenue equals sum of line items, no negative quantities, every order has a valid customer. Catching a violation in Bronze is 100x cheaper than catching it in a CEO's dashboard.
How do I get application engineers to care about data contracts?: Three things shift the conversation. (1) Make the cost visible: track data incidents caused by upstream changes and report them in the same forum as application incidents. (2) Make contracts low-friction: a YAML file in their repo with a CI check is tolerable; a 30-minute meeting per schema change is not. (3) Tie it to outcomes they care about — ML model regressions, exec dashboards going dark, regulatory reports needing rework. Contract adoption is a culture change, and the data team has to do the political work to make it stick.
Is Great Expectations still relevant in 2026?: Yes, but narrower than its 2020 peak. dbt tests absorbed most in-warehouse assertion use cases; observability tools absorbed monitoring. Great Expectations remains the right pick for batch validation outside the warehouse — validating vendor CSVs before load, asserting Pandas DataFrames in ML pipelines, certifying data products across team boundaries. A library, not a platform; shines when expectation suites must travel with the data.
What is the right alerting threshold for data quality issues?: Two-tier severity, mapped to who gets paged. Errors block the build and page in business hours: PK violations, freshness SLA breaches on revenue-critical tables, schema incompatibilities. Warnings go to Slack and get reviewed weekly: distribution drift, soft volume anomalies, optional foreign-key warns. The fastest way to destroy a quality program is to page on warnings — alert fatigue makes the team ignore real signals.
How do lineage tools compare to just reading the dbt DAG?: The dbt DAG covers the warehouse but stops at the source and the dashboard. A real lineage tool extends upstream (application databases, Kafka topics, SaaS sources) and downstream (BI tools, reverse-ETL syncs, ML feature stores). Small team with one warehouse: dbt docs is enough. 20+ source systems with multiple BI tools and reverse-ETL: a dedicated lineage tool is where end-to-end impact analysis lives.
What's the Pinterest Data Quality Canvas?: A framework Pinterest's data engineering team published for structuring quality work across producers, consumers, and platform teams. It maps quality dimensions (completeness, accuracy, consistency, timeliness, validity, uniqueness) against responsibility owners and surfaces the gaps where nobody owns the failure mode — the canonical template for treating quality as a shared discipline.

Sources

About the author. Blake Crosley founded ResumeGeni and writes about data engineering, hiring technology, and ATS optimization. More writing at blakecrosley.com.