Search Shortcut cmd + k | ctrl + k
hive_metastore

Connect to your Hive Metastore, attach it as a native DuckDB catalog and query the data inside with ease!

Maintainer(s): thijs-s

Installing and Loading

INSTALL hive_metastore FROM community;
LOAD hive_metastore;

Example

-- Attach the Hive Metastore as a catalog in DuckDB
ATTACH 'thrift://<host>:<port>' AS <catalog_name> (TYPE hive_metastore);

-- You are ready to rock!
SELECT * FROM <catalog_name>.<schema_name>.<table_name>;

-- For querying tables from object storages, you still need to set up the storage extensions (e.g. s3)
CREATE SECRET s3 (TYPE S3, KEY_ID 'access-key', SECRET 'secret-key', ENDPOINT 'localhost:9000');

About hive_metastore

DuckDB Hive Metastore extension enables DuckDB to connect to Apache Hive Metastore via Thrift protocol and query tables stored in DuckDB-supported formats. The extension provides seamless integration with the Hive ecosystem while leveraging DuckDB's powerful analytical capabilities.

Key Features

  • Implementation of Hive catalog as a native DuckDB catalog
  • Automatic schema discovery, including support for complex data types such as arrays and maps
  • Support for Parquet, CSV, Iceberg, Delta, ORC, and Avro

Usage

The usage of the extension revolves around attaching the Hive Metastore as a catalog in DuckDB and then querying the tables as if they were native DuckDB tables. The attach command looks like any other DuckDB attach command:

ATTACH 'thrift://<host>:<port>' AS <catalog_name> (<args>);

Supported arguments include:

  • TYPE (required): Must be set to hive_metastore to indicate that we want to use the Hive Metastore extension
  • WAREHOUSE_LOCATION: The warehouse location path. Used for table storage location resolution (mostly not required, but can be useful in some cases).
  • DEFAULT_SCHEMA: The database/schema name to use when queries don't specify one. Defaults to default if not provided.