Search Shortcut cmd + k | ctrl + k
ducksmiles

Cheminformatics toolkit for DuckDB - SMILES, InChI, MOL/SDF, PDB, and SELFIES molecular analysis from SQL

Maintainer(s): nkwork9999

Installing and Loading

INSTALL ducksmiles FROM community;
LOAD ducksmiles;

Example

-- Validate SMILES
SELECT mol_is_valid('CCO');          -- true
SELECT mol_is_valid('invalid_xyz');  -- false

-- Molecular formula (Hill system)
SELECT mol_formula('CCO');           -- C2H6O  (ethanol)
SELECT mol_formula('c1ccccc1');      -- C6H6   (benzene)
SELECT mol_formula('O');             -- H2O    (water)

-- Molecular weight
SELECT round(mol_weight('c1ccccc1'), 2);  -- 78.11

-- InChI layer extraction
SELECT inchi_formula('InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)');  -- C2H4O2

-- Convert SMILES to SELFIES (ML-friendly notation)
SELECT smiles_to_selfies('CCO');

-- Batch processing
SELECT smiles, mol_formula(smiles) AS formula, mol_num_atoms(smiles) AS atoms
FROM (VALUES ('CCO'), ('c1ccccc1'), ('CC(=O)Oc1ccccc1C(=O)O')) AS t(smiles);

About ducksmiles

Cheminformatics toolkit for DuckDB — analyze molecular structures directly from SQL without leaving your database. Pure Rust implementation with no external chemistry library dependencies (no RDKit required).

Supported Formats:

  • SMILES: Molecular validation, formula, weight, atom/bond counts
  • InChI/InChIKey: Layer extraction, stereochemistry detection, skeleton matching
  • MOL/SDF: V2000/V3000 block parsing, molecule counting
  • PDB/CIF/XYZ: Protein structure analysis (atom, chain, residue, model counts)
  • SELFIES: Bidirectional SMILES-SELFIES conversion for ML pipelines

28 scalar SQL functions for molecular property extraction, format conversion, and structural comparison. Ideal for cheminformatics datasets, drug discovery pipelines, and molecular ML feature engineering.

Architecture: Rust (core logic) + C++ (DuckDB integration via FFI)

Added Functions

function_name function_type description comment examples
inchi_charge scalar NULL NULL  
inchi_connections scalar NULL NULL  
inchi_formula scalar NULL NULL  
inchi_has_stereo scalar NULL NULL  
inchi_hydrogens scalar NULL NULL  
inchi_is_standard scalar NULL NULL  
inchi_is_valid scalar NULL NULL  
inchi_num_stereo_centers scalar NULL NULL  
inchi_skeleton_match scalar NULL NULL  
inchi_stereo_bond scalar NULL NULL  
inchi_stereo_tetrahedral scalar NULL NULL  
inchi_version scalar NULL NULL  
inchikey_connectivity scalar NULL NULL  
inchikey_is_valid scalar NULL NULL  
inchikey_protonation scalar NULL NULL  
inchikey_stereo scalar NULL NULL  
mol_block_formula scalar NULL NULL  
mol_block_name scalar NULL NULL  
mol_block_num_atoms scalar NULL NULL  
mol_block_num_bonds scalar NULL NULL  
mol_block_weight scalar NULL NULL  
mol_exact_mass scalar NULL NULL  
mol_formula scalar NULL NULL  
mol_is_valid scalar NULL NULL  
mol_num_atoms scalar NULL NULL  
mol_num_bonds scalar NULL NULL  
mol_weight scalar NULL NULL  
sdf_count scalar NULL NULL  
selfies_is_valid scalar NULL NULL  
selfies_to_smiles scalar NULL NULL  
smiles_to_selfies scalar NULL NULL  
structure_atom_count scalar NULL NULL  
structure_chain_count scalar NULL NULL  
structure_model_count scalar NULL NULL  
structure_residue_count scalar NULL NULL  

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.