Navigate and explore the filesystem using SQL
Installing and Loading
INSTALL hostfs FROM community;
LOAD hostfs;
Example
-- Navigate to the workspace and list the files
D PRAGMA cd('/Users/paul/workspace');
D PRAGMA ls;
┌───────────────────────────────┐
│ path │
│ varchar │
├───────────────────────────────┤
│ ./duckdb │
│ ./playground │
│ ./hostfs │
-- Find the files you were working on last
D SELECT path, file_last_modified(path) AS date FROM ls() WHERE 'csv' IN file_extension(path) ORDER BY date LIMIT 1 ;
┌───────────────────────────┬─────────────────────┐
│ path │ date │
│ varchar │ timestamp │
├───────────────────────────┼─────────────────────┤
│ ./sketch_results_join.csv │ 2024-07-13 23:25:48 │
└───────────────────────────┴─────────────────────┘
-- List the top 3 file types by total size, with file count, ordered by size.
D SELECT size, count, file_extension AS "type"
FROM (
SELECT SUM(file_size(path)) AS size_raw, format_bytes(size_raw) AS size, COUNT(*) AS count, file_extension(path) AS file_extension
FROM lsr('/Users/paul/workspace', 10)
GROUP BY file_extension(path)
) AS subquery
ORDER BY size_raw DESC LIMIT 3;
┌───────────┬───────┬─────────┐
│ size │ count │ type │
│ varchar │ int64 │ varchar │
├───────────┼───────┼─────────┤
│ 246.95 GB │ 29 │ .duckdb │
│ 90.33 GB │ 3776 │ .tmp │
│ 26.17 GB │ 28175 │ .csv │
└───────────┴───────┴─────────┘
About hostfs
The HostFS extension allows you to navigate and explore the filesystem using SQL. It provides a set of functions to list files, get file metadata, and more. For more information, please see the HostFS documentation.
Added Functions
function_name | function_type | description | comment | example |
---|---|---|---|---|
absolute_path | scalar | |||
cd | table | |||
cd | pragma | |||
file_extension | scalar | |||
file_last_modified | scalar | |||
file_name | scalar | |||
file_size | scalar | |||
hostfs | scalar | |||
hsize | scalar | |||
is_dir | scalar | |||
is_file | scalar | |||
ls | table | |||
ls | pragma | |||
lsr | pragma | |||
lsr | table | |||
path_exists | scalar | |||
path_type | scalar | |||
pwd | scalar | |||
pwd | pragma |