+1 vote
in Programming Languages by (60.0k points)
I want to calculate the Tanimoto Similarity between two compounds. How can I use the RDkit package for it?

1 Answer

+2 votes
by (71.8k points)
selected by
Best answer

Here are steps to calculate the Tanimoto Similarity between two compounds using the RDkit python module:

  • Create molecules using the given smiles for the compounds.
  • Generate fingerprints for compounds using their molecules.
  • Use the fingerprints to calculate the Tanimoto Similarity between the compounds.
Here is an example to show which functions from the RDkit package you need to use to calculate the Tanimoto Similarity.
from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem import AllChem
smi1 = 'COC1=CC=C(C=C1)N2C(=O)C3=C(N=C2[C@H]4CCCN4)C(=CC(=C3)[N+](=O)[O-])C'
smi2 = 'CN1CCN(/C(=N/C=2C=CC(=CC2C(=O)NC=3C=CC=CC3)[N+](=O)[O-])/C1)C'
# create molecule using given smiles
mol1 = Chem.MolFromSmiles(smi1)
mol2 = Chem.MolFromSmiles(smi2)
# generate fingerprints
fpgen = AllChem.GetRDKitFPGenerator()
fp1 = fpgen.GetFingerprint(mol1)
fp2 = fpgen.GetFingerprint(mol2)
# compute Tanimoto Similarity using fingerprints
simi = DataStructs.TanimotoSimilarity(fp1, fp2)