Semantic type classification — detects 168 data types (emails, URLs, dates, UUIDs, etc.) from raw strings
Installing and Loading
INSTALL finetype FROM community;
LOAD finetype;
Example
-- Classify a value into one of 168 semantic types
D SELECT finetype('[email protected]') AS detected_type;
┌───────────────────────┐
│ detected_type │
│ varchar │
├───────────────────────┤
│ identity.person.email │
└───────────────────────┘
-- Get detailed classification with confidence and recommended DuckDB type
D SELECT finetype_detail('192.168.1.1') AS detail;
┌───────────────────────────────────────────────────────────────────────────────┐
│ detail │
│ varchar │
├───────────────────────────────────────────────────────────────────────────────┤
│ {"type":"technology.internet.ip_v4","confidence":0.9998,"duckdb_type":"INET"} │
└───────────────────────────────────────────────────────────────────────────────┘
-- Normalize values for safe TRY_CAST to detected types
D SELECT finetype_cast('2024-01-15') AS normalized;
┌────────────┐
│ normalized │
│ varchar │
├────────────┤
│ 2024-01-15 │
└────────────┘
-- Classify all scalar values in a JSON document
D SELECT finetype_unpack('{"email":"[email protected]","active":true}') AS annotated;
About finetype
FineType is a tiered CharCNN semantic type classifier that detects 168 data types from raw string values.
It uses a hierarchical architecture of 34 character-level convolutional neural networks to classify values
into a three-level taxonomy: domain.category.type.
Functions
finetype(value VARCHAR) → VARCHAR
Classify a single value. Returns the full semantic type label.
SELECT finetype('https://example.com'); -- technology.internet.url
SELECT finetype('2024-01-15'); -- datetime.date.iso
SELECT finetype('true'); -- representation.boolean.terms
finetype_detail(value VARCHAR) → VARCHAR
Classify with full detail — returns JSON with type, confidence (0.0–1.0), and recommended DuckDB type.
SELECT finetype_detail('[email protected]');
-- {"type":"identity.person.email","confidence":0.9992,"duckdb_type":"VARCHAR"}
finetype_cast(value VARCHAR) → VARCHAR
Normalize a value for safe TRY_CAST() to its detected DuckDB type.
Handles date format conversion (US/EU → ISO), boolean normalization, UUID lowercasing, numeric cleanup.
SELECT finetype_cast('01/15/2024'); -- 2024-01-15 (US date → ISO)
SELECT finetype_cast('true'); -- true (boolean normalization)
finetype_unpack(json VARCHAR) → VARCHAR
Recursively classify all scalar values in a JSON document. Returns annotated JSON with type/confidence/duckdb_type for each field.
finetype_version() → VARCHAR
Returns the extension version string.
Type Taxonomy
168 types organized into 6 domains:
- datetime: dates, times, timestamps, epochs, durations, offsets (35 types)
- technology: URLs, IPs, UUIDs, booleans, versions, codes (30 types)
- geography: coordinates, locations, addresses, transportation (17 types)
- identity: names, emails, phones, payments, medical (30 types)
- representation: booleans, numbers, text, files, scientific (25 types)
- container: JSON, XML, CSV, arrays, key-value (11 types)
Use Cases
Data profiling — Detect column types in messy CSV/Parquet data:
SELECT column_name, finetype(value) AS detected_type, count(*) AS cnt
FROM my_table UNPIVOT (value FOR column_name IN (*))
GROUP BY column_name, detected_type
ORDER BY column_name, cnt DESC;
Schema inference — Get DuckDB-native type recommendations:
SELECT column_name,
finetype_detail(value)::JSON->>'duckdb_type' AS recommended_type
FROM my_table UNPIVOT (value FOR column_name IN (*))
GROUP BY ALL;
For more information, see the FineType documentation.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| finetype | scalar | NULL | NULL | |
| finetype_cast | scalar | NULL | NULL | |
| finetype_detail | scalar | NULL | NULL | |
| finetype_unpack | scalar | NULL | NULL | |
| finetype_version | scalar | NULL | NULL |