Smart Development Intelligence for DuckDB - Git-aware data analysis capabilities that allow querying git history, accessing files at any revision, and performing version-aware data analysis with SQL.
Installing and Loading
INSTALL duck_tails FROM community;
LOAD duck_tails;
Example
-- Load the extension
LOAD 'duck_tails';
-- Query git history (defaults to current directory)
SELECT commit_hash, author_name, message, author_date
FROM git_log() LIMIT 5;
-- Access files from git repository at specific revisions
SELECT * FROM read_csv('git://data/sales.csv@HEAD');
-- Compare data between commits
SELECT COUNT(*) FROM read_csv('git://data/sales.csv@HEAD') AS current_count,
COUNT(*) FROM read_csv('git://data/sales.csv@HEAD~1') AS previous_count;
-- Analyze text differences
SELECT * FROM read_git_diff('git://README.md@HEAD', 'git://README.md@HEAD~1');
About duck_tails
Duck Tails brings git-aware data analysis capabilities to DuckDB, enabling sophisticated version-controlled data workflows. The extension provides three core capabilities:
Git Filesystem Access: Use the git://
protocol to access any file in your git repository at any commit, branch, or tag. This allows you to query historical data states, compare versions, and perform temporal analysis directly in SQL.
Repository Metadata Queries: Query git repository information directly with table functions like git_log()
, git_branches()
, and git_tags()
. Analyze commit histories, track development patterns, and integrate repository metadata into your analytical workflows.
Text Diff Analysis: Comprehensive text diffing capabilities with functions like diff_text()
, read_git_diff()
, and text_diff_stats()
. Analyze changes between file versions, track configuration drift, and perform code change analysis.
Key functions include:
git_log([path])
- Query commit historygit_branches([path])
- List repository branchesgit_tags([path])
- List repository tagsdiff_text(old, new)
- Compute text differencesread_git_diff(file1, [file2])
- Structured diff analysistext_diff_lines(diff)
- Parse diff into line-by-line changestext_diff_stats(old, new)
- Diff statistics and metrics
The extension supports mixed file systems, allowing you to combine git://, local files, S3, and other DuckDB-supported protocols in a single query. Built with libgit2 for robust git operations and comprehensive error handling.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
diff_text | scalar | NULL | NULL | |
git_branches | table | NULL | NULL | |
git_log | table | NULL | NULL | |
git_tags | table | NULL | NULL | |
read_git_diff | table | NULL | NULL | |
text_diff | scalar | NULL | NULL | |
text_diff_lines | table | NULL | NULL | |
text_diff_stats | scalar | NULL | NULL |