libmagic/file utilities ported to DuckDB
Maintainer(s):
carlopi
Installing and Loading
INSTALL magic FROM community;
LOAD magic;
Example
--- Discover autodetected types for files in a folder
SELECT magic_mime(file), magic_type(file), file
FROM glob('path/to/folder/**');
--- Inspect auto-detection capabilities of the `magic` extension
FROM magic_capabilities();
--- A remote GeoPackage — spatial, sniffed by content (runnable as-is):
FROM read_any('https://www.geopackage.org/data/gdal_sample_v1.2_no_extensions.gpkg');
--- A table inside a Postgres database (bring your own connection string):
FROM read_any('postgres://user:pass@host:5432/[email protected]');
--- An Iceberg table on AWS S3 Tables (bring your own ARN + CREATE SECRET):
FROM read_any('arn:aws:s3tables:us-east-1:123456789012:bucket/[email protected]');
About magic
Port of libmagic (that powers file UNIX utility), allow to classify files based on the content of the header, accoring to the libmagic library.
Packaged with version 5.45 of the magic library. The magic.mgc database is at the moment statically compiled in the library, so it's the same across platforms but immutable.
Implements read-from-any file capabilties, via magic auto-detection or name pattern.
Implements also sub-resource dispatcher via @schema.tbl, allow also catalogs to be queried as a table.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| magic_archive_members | scalar | NULL | NULL | |
| magic_capabilities | table_macro | NULL | NULL | |
| magic_mime | scalar | Returns the MIME type for the given file path using the libmagic database (e.g. 'application/json', 'text/plain'). | NULL | [SELECT magic_mime('myfile.json');, SELECT file, magic_mime(file) AS mime FROM glob('data/*/');] |
| magic_required_extensions | scalar | Returns the list of DuckDB extensions that must be loaded before reading the given file with read_any(). Returns an empty list for built-in formats (CSV, blob). | NULL | [SELECT magic_required_extensions('myfile.json');, SELECT file, magic_required_extensions(file) AS exts FROM glob('data/*/');] |
| magic_type | scalar | Returns the file type description for the given file path using the libmagic database (e.g. 'Apache Parquet', 'JSON data'). | NULL | [SELECT magic_type('myfile.parquet');, SELECT file, magic_type(file) AS type FROM glob('data/*/');] |
| read_any | table_macro | NULL | NULL | |
| read_any_impl | table_macro | NULL | NULL | |
| read_attacheable_database | table | NULL | NULL | |
| read_har | table_macro | NULL | NULL | |
| split_into_components | scalar | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.