StoryMarch 21, 2026

Why We Built Provero

Data quality tools have a pattern: start open source, gain traction, then change the license. The checks that worked yesterday now require an enterprise contract. Provero exists because we think data quality should stay open.

The problem with existing tools

Great Expectations is powerful but requires hundreds of lines of Python to define what should be a simple check. Soda Core started as a clean YAML-first tool, then moved to the Elastic License v2, locking features behind a commercial product. Monte Carlo and similar tools are SaaS-only, starting at five figures per year.

The common thread: you either pay with complexity, pay with money, or pay with vendor lock-in. Usually all three.

What we wanted

Declarative. Define checks in YAML, not Python classes. Three lines to validate a column, not thirty.

Truly open source. Apache License 2.0. No asterisks, no feature gates, no license changes down the road.

Fast. 1 million rows checked in under 50ms. The engine compiles N checks into a single SQL query.

Portable. DuckDB, PostgreSQL, Snowflake, BigQuery, MySQL, Redshift. Same YAML, any database.

What Provero does today

Provero ships with 16 check types: from basics like not_null and unique to advanced checks like referential_integrity (FK validation between tables) and anomaly (Z-Score, MAD, IQR with zero external dependencies).

# provero.yaml
source:
  type: duckdb
  table: orders

checks:
  - not_null: [order_id, customer_id, amount]
  - unique: order_id
  - range:
      column: amount
      min: 0
      max: 100000

Run provero run and get a colored table with pass/fail results, a quality score, and failing row queries. Run provero watch --interval 5m for continuous monitoring. Export to dbt with provero export dbt.

Migrating from Soda

When Soda moved to ELv2, teams using the open source version faced a choice: accept the new license terms, or find an alternative. If you have SodaCL checks, converting them takes one command:

$ provero import soda soda_checks.yaml -o provero.yaml

The converter maps missing_count to not_null, duplicate_count to unique, and so on. Unsupported checks are included as comments.

Get started

$ pip install provero
$ provero init
$ provero run