flock

Search Shortcut cmd + k | ctrl + k

Documentation

flock

Downloads 751this week

GitHub stars 338

Extension repository on GitHub

Extension descriptor (YAML)

LLM & RAG extension to combine analytics and semantic analysis

Maintainer(s): anasdorbani, queryproc

Installing and Loading

INSTALL flock FROM community;
LOAD flock;

Example

-- After loading, any function call will throw an error if the provider's secret doesn't exist

-- Create your provider secret by following the [documentation](https://dais-polymtl.github.io/flock/docs/what-is-flock/). For example, you can create a default OpenAI API key as follows:
D CREATE SECRET (TYPE OPENAI, API_KEY 'your-api-key');

-- Call an OpenAI model with a predefined prompt ('Tell me hello world') and default model ('gpt-4o-mini')
D SELECT llm_complete({'model_name': 'default'}, {'prompt_name': 'hello-world'});
┌──────────────────────────────────────────┐
│ llm_complete(hello_world, default_model) │
│                 varchar                  │
├──────────────────────────────────────────┤
│                Hello world               │
└──────────────────────────────────────────┘

-- Check the prompts and supported models
D GET PROMPTS;
D GET MODELS;

-- Create a new prompt for summarizing text
D CREATE PROMPT('summarize', 'summarize the text into 1 word: {{text}}');

-- Create a variable name for the model to do the summarizing
D CREATE MODEL('summarizer-model', 'gpt-4o', 'openai');

-- Summarize text and pass it as parameter 
D SELECT llm_complete({'model_name': 'summarizer-model'}, {'prompt_name': 'summarize','context_columns': [{'data': 'We support more functions and approaches to combine relational analytics and semantic analysis. Check our repo for documentation and examples.'}}]);

About flock

Flock is an experimental DuckDB extension that enables seamless integration of large language models (LLMs) and retrieval-augmented generation (RAG) directly within SQL.

It introduces MODEL and PROMPT objects as first-class SQL entities, making it easy to define, manage, and reuse LLM interactions. Core functions like llm_complete, llm_filter, and llm_rerank allow you to perform generation, semantic filtering, and ranking—all from SQL.

Flock is designed for rapid prototyping of LLM-based analytics and is optimized with batching and caching features for better performance.

📄 For more details and examples, see the Flock documentation.

Note: Flock is part of ongoing research by the Data & AI Systems (DAIS) Laboratory @ Polytechnique Montréal. It is under active development, and some features may evolve. Feedback and contributions are welcome!

Added Functions

function_name	function_type	description	comment	examples
llm_complete	scalar	Generates text completions using a specified language model	Requires model config and prompt; supports text and image inputs	[SELECT llm_complete({'model_name': 'gpt-4o'}, {'prompt': 'Explain the purpose of Flock.'});]
llm_filter	scalar	Filters data based on language model evaluations returning boolean values	Commonly used in WHERE clause; supports text and image inputs	[SELECT * FROM data WHERE llm_filter({'model_name': 'gpt-4o'}, {'prompt': 'Is this eco-friendly?', 'context_columns': [{'data': content}]});]
llm_embedding	scalar	Generates embeddings for input text	Useful for semantic similarity; text only (no image support)	[SELECT llm_embedding({'model_name': 'text-embedding-3-small'}, {'context_columns': [{'data': product_name}]}) FROM products;]
llm_reduce	aggregate	Aggregates multiple inputs into a single output using a language model	Use with GROUP BY; summarizes or combines multiple rows	[SELECT category, llm_reduce({'model_name': 'gpt-4o'}, {'prompt': 'Summarize the following', 'context_columns': [{'data': content}]}) FROM documents GROUP BY category;]
llm_rerank	aggregate	Reorders query results based on relevance scores from a language model	Uses sliding window for long lists; returns JSON array of reranked rows	[SELECT llm_rerank({'model_name': 'gpt-4o'}, {'prompt': 'AI and machine learning', 'context_columns': [{'data': document_title}, {'data': document_content}]}) FROM search_results;]
llm_first	aggregate	Selects the top-ranked result after reranking by relevance	Returns single JSON object; use with or without GROUP BY	[SELECT llm_first({'model_name': 'gpt-4o'}, {'prompt': 'high-performance computing', 'context_columns': [{'data': product_name}, {'data': product_description}]}) FROM products;]
llm_last	aggregate	Selects the bottom-ranked result after reranking by relevance	Returns single JSON object; use with or without GROUP BY	[SELECT llm_last({'model_name': 'gpt-4o'}, {'prompt': 'premium audio quality', 'context_columns': [{'data': product_name}, {'data': product_description}]}) FROM products;]
fusion_rrf	scalar	Implements Reciprocal Rank Fusion (RRF) to combine rankings	Input: document ranks (1 = best); use DENSE_RANK() for rank-based input	[SELECT fusion_rrf(bm25_rank, embedding_rank) AS combined_score FROM ranked_results;]
fusion_combsum	scalar	Sums normalized scores from different scoring systems	Input: normalized scores (0-1); NULL/NaN/0 treated as 0	[SELECT fusion_combsum(bm25_normalized, embedding_normalized) FROM combined_scores;]
fusion_combmnz	scalar	Sums normalized scores multiplied by hit count	Enhances impact of frequently occurring items across scoring systems	[SELECT fusion_combmnz(score1, score2) FROM combined_scores;]
fusion_combmed	scalar	Computes the median of normalized scores	Reduces effect of outliers in combined scores	[SELECT fusion_combmed(score1, score2) FROM combined_scores;]
fusion_combanz	scalar	Calculates the average of normalized scores	Provides balanced aggregation of scores; NULL/NaN/0 treated as 0	[SELECT fusion_combanz(score1, score2) FROM combined_scores;]
flock_get_metrics	scalar	Returns usage metrics for LLM function calls in the current session	Returns JSON with api_calls tokens and timing per function	[SELECT flock_get_metrics();]
flock_get_debug_metrics	scalar	Returns detailed debug metrics including registration order	Useful for debugging multi-function queries	[SELECT flock_get_debug_metrics();]
flock_reset_metrics	scalar	Resets all metrics for the current session	Returns confirmation message	[SELECT flock_reset_metrics();]

Installing and Loading

Example

About flock

Added Functions

Overloaded Functions

Added Types

Added Settings

In this article