duck_diff

Search Shortcut cmd + k | ctrl + k

Documentation

duck_diff

Downloads 624this week

GitHub stars 5

Extension repository on GitHub

Extension descriptor (YAML)

Diff two relations (tables, queries) off a primary key, reporting per-key status (identical/different/left_only/right_only), a JSON summary of changed columns, and per-column left/right values.

Maintainer(s): avaitla

Installing and Loading

INSTALL duck_diff FROM community;
LOAD duck_diff;

Example

LOAD duck_diff;

-- Create two sample snapshots
CREATE TABLE users_v1 AS SELECT * FROM (VALUES
    (1, 'Ada',   '[email protected]',   100),
    (2, 'Linus', '[email protected]',  50),
    (3, 'Grace', '[email protected]',  75)
) t(id, name, email, credits);

CREATE TABLE users_v2 AS SELECT * FROM (VALUES
    (1, 'Ada',   '[email protected]',   120),   -- credits changed
    (2, 'Linus', '[email protected]',  50),   -- unchanged
    (4, 'Mike',  '[email protected]',   10)    -- new (id 3 removed)
) t(id, name, email, credits);

-- Diff them off the primary key
SELECT id, diff_status, diff_data, credits_left, credits_right
FROM table_diff('FROM users_v1', 'FROM users_v2', pk := 'id')
ORDER BY id;

-- Or get counts and percentages per status
SELECT * FROM table_diff_summary('FROM users_v1', 'FROM users_v2', pk := 'id');

duck_diff diffs two relations off a primary key. Given a "left" and a "right" relation (written as query strings, e.g. 'FROM orders' or a full SELECT ...), it reports — per key — whether the row is identical, different, or exists only on one side, and exactly what changed. Each result row carries both a JSON diff_data summary of the changed columns and per-column expanded columns (<c>_left / <c>_right / <c>_diff_status). Composite primary keys and selecting a subset of columns to diff or ignore are supported.

Functions:

Function	Returns	Purpose
`table_diff(left, right, pk := ...)`	table	one row per key: key column(s), `diff_status`, `diff_data`, expanded columns
`table_diff_summary(left, right, pk := ...)`	one row	counts (and percentages) per status
`schema_diff(left, right)`	table	per-column name/type comparison

table_diff options include numeric_tolerance (treat numbers within a band as equal), timestamp_precision (truncate timestamps before comparing), null_equals_empty, columns / ignore (restrict or exclude compared columns), context (surface non-compared columns alongside the diff), and require_matching_columns / upcast_types for relations whose schemas don't match exactly.

Because each relation argument is a query string, any table function from another extension works — diff CSV/Parquet files, or relations from attached databases (Postgres, MySQL, BigQuery, …) against each other.

Use cases:

Refactoring SQL (possibly onto a new database) and verifying the results are identical
Capturing the differences between snapshots or points in time
Replication integrity — spot-check that a replica matches its source
Regression tests in CI — assert a model's output still matches its golden snapshot
Giving coding agents a ground-truth check that a refactor produced identical results

See the function reference for full documentation.

Added Functions

function_name	function_type	description	comment
schema_diff	table	NULL	NULL
table_diff	table	NULL	NULL
table_diff_summary	table	NULL	NULL

Installing and Loading

Example

About duck_diff

Added Functions

Overloaded Functions

Added Types

Added Settings

In this article