Navigate and explore the filesystem using SQL
Maintainer(s):
Gropaul
Installing and Loading
INSTALL hostfs FROM community;
LOAD hostfs;
Example
-- Navigate to the workspace and list the files
D PRAGMA cd('/Users/paul/workspace');
D PRAGMA ls;
┌───────────────────────────────┐
│ path │
│ varchar │
├───────────────────────────────┤
│ ./duckdb │
│ ./playground │
│ ./hostfs │
-- Find the files you were working on last
D SELECT path, file_last_modified(path) AS date FROM ls() WHERE 'csv' IN file_extension(path) ORDER BY date LIMIT 1 ;
┌───────────────────────────┬─────────────────────┐
│ path │ date │
│ varchar │ timestamp │
├───────────────────────────┼─────────────────────┤
│ ./sketch_results_join.csv │ 2024-07-13 23:25:48 │
└───────────────────────────┴─────────────────────┘
-- List the top 3 file types by total size, with file count, ordered by size.
D SELECT size, count, file_extension AS "type"
FROM (
SELECT SUM(file_size(path)) AS size_raw, format_bytes(size_raw) AS size, COUNT(*) AS count, file_extension(path) AS file_extension
FROM lsr('/Users/paul/workspace', 10)
GROUP BY file_extension(path)
) AS subquery
ORDER BY size_raw DESC LIMIT 3;
┌───────────┬───────┬─────────┐
│ size │ count │ type │
│ varchar │ int64 │ varchar │
├───────────┼───────┼─────────┤
│ 246.95 GB │ 29 │ .duckdb │
│ 90.33 GB │ 3776 │ .tmp │
│ 26.17 GB │ 28175 │ .csv │
└───────────┴───────┴─────────┘
About hostfs
The HostFS extension allows you to navigate and explore the filesystem using SQL. It provides a set of functions to list files, get file metadata, and more. For more information, please see the HostFS documentation.
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
absolute_path | scalar | NULL | NULL | [] |
cd | pragma | NULL | NULL | [] |
cd | table | NULL | NULL | [] |
file_extension | scalar | NULL | NULL | [] |
file_last_modified | scalar | NULL | NULL | [] |
file_name | scalar | NULL | NULL | [] |
file_size | scalar | NULL | NULL | [] |
hostfs | scalar | NULL | NULL | [] |
hsize | scalar | NULL | NULL | [] |
is_dir | scalar | NULL | NULL | [] |
is_file | scalar | NULL | NULL | [] |
ls | pragma | NULL | NULL | [] |
ls | table | NULL | NULL | [] |
lsr | pragma | NULL | NULL | [] |
lsr | table | NULL | NULL | [] |
path_exists | scalar | NULL | NULL | [] |
path_type | scalar | NULL | NULL | [] |
pwd | pragma | NULL | NULL | [] |
pwd | scalar | NULL | NULL | [] |