LLM & RAG extension to combine analytics and semantic analysis
Installing and Loading
INSTALL flockmtl FROM community;
LOAD flockmtl;
Example
-- After loading, any function call will throw an error if the provider's secret doesn't exist
-- Create your provider secret by following the [documentation](https://dais-polymtl.github.io/flockmtl/docs/what-is-flockmtl/). For example, you can create a default OpenAI API key as follows:
D CREATE SECRET (TYPE OPENAI, API_KEY 'your-api-key');
-- Call an OpenAI model with a predefined prompt ('Tell me hello world') and default model ('gpt-4o-mini')
D SELECT llm_complete({'model_name': 'default'}, {'prompt_name': 'hello-world'});
┌──────────────────────────────────────────┐
│ llm_complete(hello_world, default_model) │
│ varchar │
├──────────────────────────────────────────┤
│ Hello world │
└──────────────────────────────────────────┘
-- Check the prompts and supported models
D GET PROMPTS;
D GET MODELS;
-- Create a new prompt for summarizing text
D CREATE PROMPT('summarize', 'summarize the text into 1 word: {{text}}');
-- Create a variable name for the model to do the summarizing
D CREATE MODEL('summarizer-model', 'gpt-4o', {'context_window': 128000, 'max_output_tokens': 16400});
-- Summarize text and pass it as parameter
D SELECT llm_complete({'model_name': 'summarizer-model'}, {'prompt_name': 'summarize'}, {'text': 'We support more functions and approaches to combine relational analytics and semantic analysis. Check our repo for documentation and examples.'});
About flockmtl
FlockMTL is an experimental DuckDB extension that enables seamless integration of large language models (LLMs) and retrieval-augmented generation (RAG) directly within SQL.
It introduces MODEL
and PROMPT
objects as first-class SQL entities, making it easy to define, manage, and reuse LLM interactions. Core functions like llm_complete
, llm_filter
, and llm_rerank
allow you to perform generation, semantic filtering, and ranking—all from SQL.
FlockMTL is designed for rapid prototyping of LLM-based analytics and is optimized with batching and caching features for better performance.
📄 For more details and examples, see the FlockMTL documentation.
Note: FlockMTL is part of ongoing research by the Data & AI Systems (DAIS) Laboratory @ Polytechnique Montréal. It is under active development, and some features may evolve. Feedback and contributions are welcome!
Added Functions
function_name | function_type | description | comment | examples |
---|---|---|---|---|
llm_complete | scalar | Generates text completions using a specified language model | Requires a defined prompt and model | [SELECT llm_complete({'model_name': 'default'}, {'prompt_name': 'hello-world'});] |
llm_complete_json | scalar | Produces JSON-formatted text completions | Useful for structured outputs | [SELECT llm_complete_json({'model_name': 'default'}, {'prompt_name': 'hello-world'});] |
llm_filter | scalar | Filters data based on language model evaluations | returning boolean values | [SELECT * FROM data WHERE llm_filter({'model_name': 'default'}, {'prompt_name': 'is_relevant'}, {'text': content});] |
llm_embedding | scalar | Generates embeddings for input text | Useful for semantic similarity tasks | [SELECT llm_embedding({'model_name': 'default'}, {'text': 'Sample text'});] |
llm_reduce | aggregate | Aggregates multiple inputs into a single output using a language model | Summarizes or combines multiple rows | [SELECT llm_reduce({'model_name': 'default'}, {'prompt_name': 'summarize'}, {'text': content}) FROM documents;] |
llm_reduce_json | aggregate | Similar to llm_reduce but outputs JSON-formatted results | Useful for structured summaries | [SELECT llm_reduce_json({'model_name': 'default'}, {'prompt_name': 'summarize'}, {'text': content}) FROM documents;] |
llm_rerank | aggregate | Reorders query results based on relevance scores from a language model | Enhances result relevance in search applications | [SELECT llm_rerank({'model_name': 'default'}, {'prompt_name': 'rank_relevance'}, {'text': content}) FROM search_results;] |
llm_first | aggregate | Selects the top-ranked result after reranking | Retrieves the most relevant item | [SELECT llm_first({'model_name': 'default'}, {'prompt_name': 'rank_relevance'}, {'text': content}) FROM search_results;] |
llm_last | aggregate | Selects the bottom-ranked result after reranking | Retrieves the least relevant item | [SELECT llm_last({'model_name': 'default'}, {'prompt_name': 'rank_relevance'}, {'text': content}) FROM search_results;] |
fusion_rrf | scalar | Implements Reciprocal Rank Fusion (RRF) to combine rankings | Combines rankings from multiple scoring systems | [SELECT fusion_rrf(score1, score2) FROM combined_scores;] |
fusion_combsum | scalar | Sums normalized scores from different scoring systems | Useful for aggregating scores from various models | [SELECT fusion_combsum(score1, score2) FROM combined_scores;] |
fusion_combmnz | scalar | Sums normalized scores and multiplies by the hit count | Enhances the impact of frequently occurring items | [SELECT fusion_combmnz(score1, score2) FROM combined_scores;] |
fusion_combmed | scalar | Computes the median of normalized scores | Reduces the effect of outliers in combined scores | [SELECT fusion_combmed(score1, score2) FROM combined_scores;] |
fusion_combanz | scalar | Calculates the average of normalized scores | Provides a balanced aggregation of scores | [SELECT fusion_combanz(score1, score2) FROM combined_scores;] |