Diff two relations (tables, queries) off a primary key, reporting per-key status (identical/different/left_only/right_only), a JSON summary of changed columns, and per-column left/right values.
Installing and Loading
INSTALL duck_diff FROM community;
LOAD duck_diff;
Example
LOAD duck_diff;
-- Create two sample snapshots
CREATE TABLE users_v1 AS SELECT * FROM (VALUES
(1, 'Ada', '[email protected]', 100),
(2, 'Linus', '[email protected]', 50),
(3, 'Grace', '[email protected]', 75)
) t(id, name, email, credits);
CREATE TABLE users_v2 AS SELECT * FROM (VALUES
(1, 'Ada', '[email protected]', 120), -- credits changed
(2, 'Linus', '[email protected]', 50), -- unchanged
(4, 'Mike', '[email protected]', 10) -- new (id 3 removed)
) t(id, name, email, credits);
-- Diff them off the primary key
SELECT id, diff_status, diff_data, credits_left, credits_right
FROM table_diff('FROM users_v1', 'FROM users_v2', pk := 'id')
ORDER BY id;
-- Or get counts and percentages per status
SELECT * FROM table_diff_summary('FROM users_v1', 'FROM users_v2', pk := 'id');
About duck_diff
duck_diff diffs two relations off a primary key. Given a "left" and a "right" relation
(written as query strings, e.g. 'FROM orders' or a full SELECT ...), it reports — per
key — whether the row is identical, different, or exists only on one side, and
exactly what changed. Each result row carries both a JSON diff_data summary of the
changed columns and per-column expanded columns (<c>_left / <c>_right /
<c>_diff_status). Composite primary keys and selecting a subset of columns to diff or
ignore are supported.
Functions:
| Function | Returns | Purpose |
|---|---|---|
table_diff(left, right, pk := ...) |
table | one row per key: key column(s), diff_status, diff_data, expanded columns |
table_diff_summary(left, right, pk := ...) |
one row | counts (and percentages) per status |
schema_diff(left, right) |
table | per-column name/type comparison |
table_diff options include numeric_tolerance (treat numbers within a band as equal),
timestamp_precision (truncate timestamps before comparing), null_equals_empty,
columns / ignore (restrict or exclude compared columns), context (surface
non-compared columns alongside the diff), and require_matching_columns /
upcast_types for relations whose schemas don't match exactly.
Because each relation argument is a query string, any table function from another extension works — diff CSV/Parquet files, or relations from attached databases (Postgres, MySQL, BigQuery, …) against each other.
Use cases:
- Refactoring SQL (possibly onto a new database) and verifying the results are identical
- Capturing the differences between snapshots or points in time
- Replication integrity — spot-check that a replica matches its source
- Regression tests in CI — assert a model's output still matches its golden snapshot
- Giving coding agents a ground-truth check that a refactor produced identical results
See the function reference for full documentation.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| schema_diff | table | NULL | NULL | |
| table_diff | table | NULL | NULL | |
| table_diff_summary | table | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.