Search Shortcut cmd + k | ctrl + k
sitting_duck

Parse and analyze source code ASTs from 25+ programming languages with tree-sitter grammars and pattern matching

Maintainer(s): teaguesterling

Installing and Loading

INSTALL sitting_duck FROM community;
LOAD sitting_duck;

Example

-- Parse Python code and find function definitions
SELECT name, start_line, peek
FROM parse_ast('
def hello():
    return "hello world"
def greet(name):
    print(f"Hello, {name}!")
    return name
', 'python')
WHERE is_function_definition(semantic_type);

-- Pattern matching: find eval() calls and capture arguments
SELECT captures['X'].peek as dangerous_input, file_path, start_line
FROM ast_match('code', 'eval(__X__)', 'python');

-- Pattern matching with variadic wildcards
SELECT captures['F'].name as func_name, captures['Y'].peek as return_value
FROM ast_match('code',
    'def __F__(__):
    %__<BODY*>__%
    return __Y__', 'python');

-- Security audit: find dangerous function calls
SELECT * FROM ast_security_audit('code')
WHERE risk_level = 'high';

-- Function complexity metrics
SELECT name, cyclomatic, max_depth, lines
FROM ast_function_metrics('code')
WHERE cyclomatic > 10;

-- Cross-language analysis with semantic types
SELECT language, COUNT(*) as functions
FROM read_ast(['src/**/*.py', 'src/**/*.js', 'src/**/*.go'])
WHERE is_function_definition(semantic_type)
GROUP BY language;

About sitting_duck

Sitting Duck is a DuckDB extension for parsing source code into Abstract Syntax Trees (ASTs) using tree-sitter grammars. It provides a powerful SQL interface to analyze, search, and understand code structure across 25+ programming languages.

Documentation: https://sitting-duck.readthedocs.io/

Core Features

Pattern Matching (New in v1.3.0): Find code structures using pattern-by-example matching with wildcards:

-- Find all eval() calls
SELECT * FROM ast_match('code', 'eval(__X__)', 'python');

-- Variadic patterns for flexible matching
SELECT * FROM ast_match('code', 'def __F__(__): %__<BODY*>__% return __Y__', 'python');

See: https://sitting-duck.readthedocs.io/en/latest/guide/pattern-matching/

Table Functions:

  • read_ast(file_pattern, language := NULL) - Parse source files into AST rows
  • parse_ast(content, language) - Parse source code strings
  • ast_match(table, pattern, lang) - Pattern matching for code search

Analysis Macros:

  • ast_function_metrics(table) - Cyclomatic complexity, nesting depth, line counts
  • ast_security_audit(table) - Detect eval, exec, pickle, SQL injection patterns
  • ast_dead_code(table) - Find potentially unused functions/classes
  • ast_nesting_analysis(table) - Identify deeply nested code
  • ast_definitions(table) - All named definitions with categories

Tree Navigation:

  • ast_descendants(table, node_id) - Get subtree (O(1) using descendant_count)
  • ast_ancestors(table, node_id) - Path from node to root
  • ast_children(table, node_id) - Immediate children
  • ast_function_scope(table, node_id) - Function body excluding nested functions

Semantic Predicates: Cross-language filtering with normalized types:

  • is_function_definition(st), is_class_definition(st), is_variable_definition(st)
  • is_function_call(st), is_literal(st), is_conditional(st), is_loop(st)

Supported Languages (27)

Category Languages
Web JavaScript, TypeScript, HTML, CSS
Systems C, C++, Go, Rust, Zig
Scripting Python, Ruby, PHP, Lua, R, Bash
Enterprise Java, C#, Kotlin, Swift
Mobile Dart
Data SQL, JSON, TOML, GraphQL, HCL
Documentation Markdown

AST Schema

Each parsed node includes:

  • type, name - Node type and extracted identifier
  • semantic_type - Normalized type for cross-language queries
  • file_path, language - Source location
  • start_line, end_line, depth - Position and nesting
  • descendant_count - For O(1) subtree queries
  • peek - Configurable source preview
  • qualified_name, signature_type, parameters, modifiers, annotations - Native extraction

Full schema: https://sitting-duck.readthedocs.io/en/latest/api/output-schema/

Use Cases

  • Security auditing - Find dangerous patterns (eval, exec, SQL injection)
  • Code quality - Complexity metrics, dead code detection, nesting analysis
  • Refactoring - Pattern-based code search across large codebases
  • Documentation - Extract function signatures and structure
  • AI workflows - Structured code understanding for LLM tools

GitHub: https://github.com/teaguesterling/sitting_duck

Added Functions

function_name function_type description comment examples
ast_ancestors table_macro NULL NULL  
ast_call_arguments table_macro NULL NULL  
ast_children table_macro NULL NULL  
ast_class_members table_macro NULL NULL  
ast_containing_line table_macro NULL NULL  
ast_dead_code table_macro NULL NULL  
ast_definitions table_macro NULL NULL  
ast_descendants table_macro NULL NULL  
ast_function_metrics table_macro NULL NULL  
ast_function_scope table_macro NULL NULL  
ast_functions_containing table_macro NULL NULL  
ast_get_source macro NULL NULL  
ast_get_source_numbered macro NULL NULL  
ast_in_range table_macro NULL NULL  
ast_match table_macro NULL NULL  
ast_nesting_analysis table_macro NULL NULL  
ast_pattern table_macro NULL NULL  
ast_pattern_list macro NULL NULL  
ast_peek_contains_any scalar NULL NULL  
ast_security_audit table_macro NULL NULL  
ast_siblings table_macro NULL NULL  
ast_supported_languages table NULL NULL  
clean_pattern macro NULL NULL  
extract_wildcard_rules macro NULL NULL  
get_kind scalar NULL NULL  
get_line macro NULL NULL  
get_lines_text macro NULL NULL  
get_searchable_types scalar NULL NULL  
get_super_kind scalar NULL NULL  
get_variadic_names macro NULL NULL  
has_body scalar NULL NULL  
is_annotation macro NULL NULL  
is_arithmetic macro NULL NULL  
is_assignment macro NULL NULL  
is_block macro NULL NULL  
is_boolean_literal macro NULL NULL  
is_call scalar NULL NULL  
is_class_definition macro NULL NULL  
is_comment macro NULL NULL  
is_comparison macro NULL NULL  
is_conditional macro NULL NULL  
is_construct scalar NULL NULL  
is_control_flow scalar NULL NULL  
is_declaration_only scalar NULL NULL  
is_definition scalar NULL NULL  
is_directive macro NULL NULL  
is_embodied scalar NULL NULL  
is_export macro NULL NULL  
is_foreign macro NULL NULL  
is_function_call macro NULL NULL  
is_function_definition macro NULL NULL  
is_identifier scalar NULL NULL  
is_import macro NULL NULL  
is_jump macro NULL NULL  
is_kind scalar NULL NULL  
is_list macro NULL NULL  
is_literal macro NULL NULL  
is_logical macro NULL NULL  
is_loop macro NULL NULL  
is_member_access macro NULL NULL  
is_module_definition macro NULL NULL  
is_number_literal macro NULL NULL  
is_parser_specific scalar NULL NULL  
is_pattern_wildcard macro NULL NULL  
is_punctuation scalar NULL NULL  
is_semantic_type scalar NULL NULL  
is_string_literal macro NULL NULL  
is_syntax_only scalar NULL NULL  
is_type_composite macro NULL NULL  
is_type_definition macro NULL NULL  
is_type_generic macro NULL NULL  
is_type_primitive macro NULL NULL  
is_type_reference macro NULL NULL  
is_variable_definition macro NULL NULL  
kind_code scalar NULL NULL  
parse_ast table NULL NULL  
parse_ast_flat table NULL NULL  
parse_ast_hierarchical table NULL NULL  
parse_html_wildcard macro NULL NULL  
pattern_has_variadic macro NULL NULL  
read_ast table NULL NULL  
read_ast_flat table NULL NULL  
read_ast_hierarchical table NULL NULL  
read_ast_hierarchical_new table NULL NULL  
read_lines table_macro NULL NULL  
read_lines_context table_macro NULL NULL  
read_lines_range table_macro NULL NULL  
semantic_type_base macro NULL NULL  
semantic_type_code scalar NULL NULL  
semantic_type_to_string scalar NULL NULL  
string_contains_any scalar NULL NULL  
string_contains_any_i scalar NULL NULL  
wildcard_capture_name macro NULL NULL