Search Shortcut cmd + k | ctrl + k
finetype

Semantic type classification — detects 168 data types (emails, URLs, dates, UUIDs, etc.) from raw strings

Maintainer(s): hughcameron

Installing and Loading

INSTALL finetype FROM community;
LOAD finetype;

Example

-- Classify a value into one of 168 semantic types
D SELECT finetype('[email protected]') AS detected_type;
┌───────────────────────┐
     detected_type     
        varchar        
├───────────────────────┤
 identity.person.email 
└───────────────────────┘

-- Get detailed classification with confidence and recommended DuckDB type
D SELECT finetype_detail('192.168.1.1') AS detail;
┌───────────────────────────────────────────────────────────────────────────────┐
                                    detail                                     
                                    varchar                                    
├───────────────────────────────────────────────────────────────────────────────┤
 {"type":"technology.internet.ip_v4","confidence":0.9998,"duckdb_type":"INET"} 
└───────────────────────────────────────────────────────────────────────────────┘

-- Normalize values for safe TRY_CAST to detected types
D SELECT finetype_cast('2024-01-15') AS normalized;
┌────────────┐
 normalized 
  varchar   
├────────────┤
 2024-01-15 
└────────────┘

-- Classify all scalar values in a JSON document
D SELECT finetype_unpack('{"email":"[email protected]","active":true}') AS annotated;

About finetype

FineType is a tiered CharCNN semantic type classifier that detects 168 data types from raw string values. It uses a hierarchical architecture of 34 character-level convolutional neural networks to classify values into a three-level taxonomy: domain.category.type.

Functions

finetype(value VARCHAR) → VARCHAR

Classify a single value. Returns the full semantic type label.

SELECT finetype('https://example.com');  -- technology.internet.url
SELECT finetype('2024-01-15');           -- datetime.date.iso
SELECT finetype('true');                 -- representation.boolean.terms

finetype_detail(value VARCHAR) → VARCHAR

Classify with full detail — returns JSON with type, confidence (0.0–1.0), and recommended DuckDB type.

SELECT finetype_detail('[email protected]');
-- {"type":"identity.person.email","confidence":0.9992,"duckdb_type":"VARCHAR"}

finetype_cast(value VARCHAR) → VARCHAR

Normalize a value for safe TRY_CAST() to its detected DuckDB type. Handles date format conversion (US/EU → ISO), boolean normalization, UUID lowercasing, numeric cleanup.

SELECT finetype_cast('01/15/2024');  -- 2024-01-15 (US date → ISO)
SELECT finetype_cast('true');        -- true (boolean normalization)

finetype_unpack(json VARCHAR) → VARCHAR

Recursively classify all scalar values in a JSON document. Returns annotated JSON with type/confidence/duckdb_type for each field.

finetype_version() → VARCHAR

Returns the extension version string.

Type Taxonomy

168 types organized into 6 domains:

  • datetime: dates, times, timestamps, epochs, durations, offsets (35 types)
  • technology: URLs, IPs, UUIDs, booleans, versions, codes (30 types)
  • geography: coordinates, locations, addresses, transportation (17 types)
  • identity: names, emails, phones, payments, medical (30 types)
  • representation: booleans, numbers, text, files, scientific (25 types)
  • container: JSON, XML, CSV, arrays, key-value (11 types)

Use Cases

Data profiling — Detect column types in messy CSV/Parquet data:

SELECT column_name, finetype(value) AS detected_type, count(*) AS cnt
FROM my_table UNPIVOT (value FOR column_name IN (*))
GROUP BY column_name, detected_type
ORDER BY column_name, cnt DESC;

Schema inference — Get DuckDB-native type recommendations:

SELECT column_name,
       finetype_detail(value)::JSON->>'duckdb_type' AS recommended_type
FROM my_table UNPIVOT (value FOR column_name IN (*))
GROUP BY ALL;

For more information, see the FineType documentation.

Added Functions

function_name function_type description comment examples
finetype scalar NULL NULL  
finetype_cast scalar NULL NULL  
finetype_detail scalar NULL NULL  
finetype_unpack scalar NULL NULL  
finetype_version scalar NULL NULL