Cheminformatics toolkit for DuckDB - SMILES, InChI, MOL/SDF, PDB, and SELFIES molecular analysis from SQL
Maintainer(s):
nkwork9999
Installing and Loading
INSTALL ducksmiles FROM community;
LOAD ducksmiles;
Example
-- Validate SMILES
SELECT mol_is_valid('CCO'); -- true
SELECT mol_is_valid('invalid_xyz'); -- false
-- Molecular formula (Hill system)
SELECT mol_formula('CCO'); -- C2H6O (ethanol)
SELECT mol_formula('c1ccccc1'); -- C6H6 (benzene)
SELECT mol_formula('O'); -- H2O (water)
-- Molecular weight
SELECT round(mol_weight('c1ccccc1'), 2); -- 78.11
-- InChI layer extraction
SELECT inchi_formula('InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)'); -- C2H4O2
-- Convert SMILES to SELFIES (ML-friendly notation)
SELECT smiles_to_selfies('CCO');
-- Batch processing
SELECT smiles, mol_formula(smiles) AS formula, mol_num_atoms(smiles) AS atoms
FROM (VALUES ('CCO'), ('c1ccccc1'), ('CC(=O)Oc1ccccc1C(=O)O')) AS t(smiles);
About ducksmiles
Cheminformatics toolkit for DuckDB — analyze molecular structures directly from SQL without leaving your database. Pure Rust implementation with no external chemistry library dependencies (no RDKit required).
Supported Formats:
- SMILES: Molecular validation, formula, weight, atom/bond counts
- InChI/InChIKey: Layer extraction, stereochemistry detection, skeleton matching
- MOL/SDF: V2000/V3000 block parsing, molecule counting
- PDB/CIF/XYZ: Protein structure analysis (atom, chain, residue, model counts)
- SELFIES: Bidirectional SMILES-SELFIES conversion for ML pipelines
28 scalar SQL functions for molecular property extraction, format conversion, and structural comparison. Ideal for cheminformatics datasets, drug discovery pipelines, and molecular ML feature engineering.
Architecture: Rust (core logic) + C++ (DuckDB integration via FFI)
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| inchi_charge | scalar | NULL | NULL | |
| inchi_connections | scalar | NULL | NULL | |
| inchi_formula | scalar | NULL | NULL | |
| inchi_has_stereo | scalar | NULL | NULL | |
| inchi_hydrogens | scalar | NULL | NULL | |
| inchi_is_standard | scalar | NULL | NULL | |
| inchi_is_valid | scalar | NULL | NULL | |
| inchi_num_stereo_centers | scalar | NULL | NULL | |
| inchi_skeleton_match | scalar | NULL | NULL | |
| inchi_stereo_bond | scalar | NULL | NULL | |
| inchi_stereo_tetrahedral | scalar | NULL | NULL | |
| inchi_version | scalar | NULL | NULL | |
| inchikey_connectivity | scalar | NULL | NULL | |
| inchikey_is_valid | scalar | NULL | NULL | |
| inchikey_protonation | scalar | NULL | NULL | |
| inchikey_stereo | scalar | NULL | NULL | |
| mol_block_formula | scalar | NULL | NULL | |
| mol_block_name | scalar | NULL | NULL | |
| mol_block_num_atoms | scalar | NULL | NULL | |
| mol_block_num_bonds | scalar | NULL | NULL | |
| mol_block_weight | scalar | NULL | NULL | |
| mol_exact_mass | scalar | NULL | NULL | |
| mol_formula | scalar | NULL | NULL | |
| mol_is_valid | scalar | NULL | NULL | |
| mol_num_atoms | scalar | NULL | NULL | |
| mol_num_bonds | scalar | NULL | NULL | |
| mol_weight | scalar | NULL | NULL | |
| sdf_count | scalar | NULL | NULL | |
| selfies_is_valid | scalar | NULL | NULL | |
| selfies_to_smiles | scalar | NULL | NULL | |
| smiles_to_selfies | scalar | NULL | NULL | |
| structure_atom_count | scalar | NULL | NULL | |
| structure_chain_count | scalar | NULL | NULL | |
| structure_model_count | scalar | NULL | NULL | |
| structure_residue_count | scalar | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.