Search Shortcut cmd + k | ctrl + k
duck_diff

Diff two relations (tables, queries) off a primary key, reporting per-key status (identical/different/left_only/right_only), a JSON summary of changed columns, and per-column left/right values.

Maintainer(s): avaitla

Installing and Loading

INSTALL duck_diff FROM community;
LOAD duck_diff;

Example

LOAD duck_diff;

-- Create two sample snapshots
CREATE TABLE users_v1 AS SELECT * FROM (VALUES
    (1, 'Ada',   '[email protected]',   100),
    (2, 'Linus', '[email protected]',  50),
    (3, 'Grace', '[email protected]',  75)
) t(id, name, email, credits);

CREATE TABLE users_v2 AS SELECT * FROM (VALUES
    (1, 'Ada',   '[email protected]',   120),   -- credits changed
    (2, 'Linus', '[email protected]',  50),   -- unchanged
    (4, 'Mike',  '[email protected]',   10)    -- new (id 3 removed)
) t(id, name, email, credits);

-- Diff them off the primary key
SELECT id, diff_status, diff_data, credits_left, credits_right
FROM table_diff('FROM users_v1', 'FROM users_v2', pk := 'id')
ORDER BY id;

-- Or get counts and percentages per status
SELECT * FROM table_diff_summary('FROM users_v1', 'FROM users_v2', pk := 'id');

About duck_diff

duck_diff diffs two relations off a primary key. Given a "left" and a "right" relation (written as query strings, e.g. 'FROM orders' or a full SELECT ...), it reports — per key — whether the row is identical, different, or exists only on one side, and exactly what changed. Each result row carries both a JSON diff_data summary of the changed columns and per-column expanded columns (<c>_left / <c>_right / <c>_diff_status). Composite primary keys and selecting a subset of columns to diff or ignore are supported.

Functions:

Function Returns Purpose
table_diff(left, right, pk := ...) table one row per key: key column(s), diff_status, diff_data, expanded columns
table_diff_summary(left, right, pk := ...) one row counts (and percentages) per status
schema_diff(left, right) table per-column name/type comparison

table_diff options include numeric_tolerance (treat numbers within a band as equal), timestamp_precision (truncate timestamps before comparing), null_equals_empty, columns / ignore (restrict or exclude compared columns), context (surface non-compared columns alongside the diff), and require_matching_columns / upcast_types for relations whose schemas don't match exactly.

Because each relation argument is a query string, any table function from another extension works — diff CSV/Parquet files, or relations from attached databases (Postgres, MySQL, BigQuery, …) against each other.

Use cases:

  • Refactoring SQL (possibly onto a new database) and verifying the results are identical
  • Capturing the differences between snapshots or points in time
  • Replication integrity — spot-check that a replica matches its source
  • Regression tests in CI — assert a model's output still matches its golden snapshot
  • Giving coding agents a ground-truth check that a refactor produced identical results

See the function reference for full documentation.

Added Functions

function_name function_type description comment examples
schema_diff table NULL NULL  
table_diff table NULL NULL  
table_diff_summary table NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.