SQLModel/RDKit database cartridge integration

This repo demonstrates how to integrate SQLModel with the RDKit database cartridge in the simplest way possible.
You may want to check out an existing solution like razi, but it requires some modifications to work with SQLModel at the time of writing, such as updating similarity search operators and result value processing.

Installation

$ git clone https://github.com/savvan0h/sqlmodel-rdkit-example.git
$ cd sqlmodel-rdkit-example
$ python -m venv venv
$ . ./venv/bin/activate
$ pip install .

(Optional) Create database

If you don’t have your own database, you can use the rvianello/postgres-rdkit:16.2-2024.03.1 image to set up a PostgreSQL database with the RDKit extension.

Warning

PostgreSQL with RDKit must use the same RDKit version as your Python environment.

$ docker run \
    --name postgres-rdkit \
    -p 5432:5432 \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -d rvianello/postgres-rdkit:16.2-2024.03.1
$ docker exec postgres-rdkit psql -U postgres postgres -c 'CREATE EXTENSION rdkit'

Create table

from sqlmodel import create_engine

from models import Compound

# Change the connection string as needed
engine = create_engine("postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres")
Compound.__table__.create(engine)

Load example data

from sqlmodel import Session

from models import morganbv_fp
from molecules import SMILES_SAMPLE

session = Session(engine)
for i, smiles in enumerate(SMILES_SAMPLE):
    compound = Compound(
        name=f"Compound {i}",
        molecule=smiles,
        mfp2=morganbv_fp(smiles)
    )
    session.add(compound)
session.commit()

Query examples

Exact structure search( `@=` )

Python

from sqlmodel import select

statement = select(Compound).where(Compound.molecule == "COC(c1ccccc1)c1ccccc1") # value can be a mol object
session.exec(statement).all()

SQL

SELECT compound.id,
       compound.name,
       mol_to_pkl(compound.molecule) AS molecule,
       bfp_to_binary_text(compound.mfp2) AS mfp2
FROM compound
WHERE compound.molecule @= mol_from_pkl(%(molecule_1)s)

Substructure search

hassubstruct( `@>` )

Python

statement = select(Compound).where(Compound.molecule.hassubstruct("C1=C(C)C=CC=C1"))
session.exec(statement).all()

SQL

SELECT compound.id,
       compound.name,
       mol_to_pkl(compound.molecule) AS molecule,
       bfp_to_binary_text(compound.mfp2) AS mfp2
FROM compound
WHERE compound.molecule @> mol_from_pkl(%(molecule_1)s)

SMARTS-based query

Python

from sqlmodel import cast

from models import QMol

qmol = cast("c1[c,n]cccc1", QMol)
statement = select(Compound).where(Compound.molecule.hassubstruct(qmol))
session.exec(statement).all()

SQL

SELECT compound.id,
       compound.name,
       mol_to_pkl(compound.molecule) AS molecule,
       bfp_to_binary_text(compound.mfp2) AS mfp2
FROM compound
WHERE compound.molecule @> CAST(%(param_1)s AS qmol)

issubstruct( `<@` )

Python

statement = select(Compound).where(Compound.molecule.issubstruct("CCN1c2ccccc2Sc2ccccc21"))
session.exec(statement).all()

SQL

SELECT compound.id,
       compound.name,
       mol_to_pkl(compound.molecule) AS molecule,
       bfp_to_binary_text(compound.mfp2) AS mfp2
FROM compound
WHERE compound.molecule <@ mol_from_pkl(%(molecule_1)s)

Similarity search

Python

smiles = "CCN1c2ccccc2Sc2ccccc21"
statement = select(Compound).where(Compound.mfp2.tanimoto_sml(morganbv_fp(smiles)))
session.exec(statement).all()

SQL

SELECT compound.id,
       compound.name,
       mol_to_pkl(compound.molecule) AS molecule,
       bfp_to_binary_text(compound.mfp2) AS mfp2
FROM compound
WHERE compound.mfp2 %% morganbv_fp(%(morganbv_fp_1)s::VARCHAR)

Adjusting the similarity cutoff

Python

from sqlmodel import text

session.exec(text("SET rdkit.tanimoto_threshold=0.6")) # default is 0.5

SQL

SET rdkit.tanimoto_threshold=0.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SQLModel/RDKit database cartridge integration

Installation

(Optional) Create database

Create table

Load example data

Query examples

Exact structure search( `@=` )

Substructure search

hassubstruct( `@>` )

SMARTS-based query

issubstruct( `<@` )

Similarity search

Adjusting the similarity cutoff

Files

README.md

Latest commit

History

README.md

File metadata and controls

SQLModel/RDKit database cartridge integration

Installation

(Optional) Create database

Create table

Load example data

Query examples

Exact structure search( @= )

Substructure search

hassubstruct( @> )

SMARTS-based query

issubstruct( <@ )

Similarity search

Adjusting the similarity cutoff

Exact structure search( `@=` )

hassubstruct( `@>` )

issubstruct( `<@` )