Skip to main content
Unlisted page
This page is unlisted. Search engines will not index it, and only users having a direct link can access it.

Chemical scripts

Supported scripts

NameFunction
Substructure search
\#{x.ChemSubstructureSearch}
Find MCS
\#{x.ChemFindMCS}
Descriptors
\#{x.ChemDescriptors}
R-Groups
\#{x.ChemGetRGroups}
Fingerprints
\#{x.ChemFingerprints}
Similarity SPE
\#{x.ChemSimilaritySPE}
SMILES to InchI
\#{x.ChemSmilesToInchi}
SMILES to Canonical
\#{x.ChemSmilesToCanonical}
Chemical map identifiers
\#{x.ChemMapIdentifiers}
Butina cluster
\#{x.ChemScripts:ButinaMoleculesClustering}
Filter by catalogs
\#{x.ChemScripts:FilterByCatalogs}
Gasteiger partial charges
\#{x.ChemScripts:GasteigerPartialCharges}
Murcko scaffolds
\#{x.ChemScripts:MurckoScaffolds}
Similarity maps using fingerprints
\#{x.ChemScripts:SimilarityMapsUsingFingerprints}
Chemical space using tSNE
\#{x.ChemScripts:ChemicalSpaceUsingtSNE}
Two component reactions
Chem:TwoComponentReaction
Chemical space using UMAP
\#{x.ChemScripts:ChemicalSpaceUsingUMAP}
USRCAT
\#{x.ChemScripts:USRCAT}
Mutate
[PLACEHOLDER]
Solubility prediction
\#{x.18b704d0-0b50-11e9-b846-1fa94a4da5d1."Predict Solubility"}
Curate
[PLACEHOLDER]

The following table gives an indicative data for the performance of certain chemical functions:

Indicative performance of chemical functions
FunctionMoleculesExecution time, s
ChemSubstructureSearch1M70
ChemFindMcs100k43
ChemDescriptors (201 descriptor)1k81
ChemDescriptors (Lipinski)1M164
ChemGetRGroups1M233
ChemFingerprints (TopologicalTorsion)1M782
ChemFingerprints (MACCSKeys)1M770
ChemFingerprints (Morgan/Circular)1M737
ChemFingerprints (RDKFingerprint)1M2421
ChemFingerprints (AtomPair)1M1574
ChemSmilesToInChI1M946
ChemSmilesToInChIKey1M389
ChemSmilesToCanonical1M331

Butina cluster

Uses desired similarity within the cluster, as defined by Tanimoto index, as the only input to the clustering program.

References:

Chemical space using tSNE

tSNE, short for t-distributed Stochastic Neighbor Embedding, is a data visualization tool designed to handle high-dimensional data. It achieves this by transforming the similarities between data points into joint probabilities, then minimizing the Kullback-Leibler divergence between the low-dimensional embedding and the original high-dimensional data. tSNE uses a non-convex cost function, meaning that different initializations can lead to different results. The following image illustrates the use of tSNE to visualize chemical space.

Chemical Space Using tSNE

References:

Chemical space using UMAP

Uniform Manifold Approximation and Projection (UMAP) is a dimensionality reduction technique that can be used for visualization similarly to tSNE, but also for general non-linear dimensionality reduction.

Chemical Space Using UMAP

References:

Filter by catalogs

Screen out or reject undesirable molecules based on various criteria.

Filter sets:

  • PAINS: Pan assay interference patterns, separated into three sets (PAINS_A, PAINS_B, and PAINS_C).
  • BRENK: Filters unwanted functionality due to potential toxicity reasons or unfavorable pharmacokinetics.
  • NIH: Annotated compounds with problematic functional groups
  • ZINC: Filtering based on drug-likeness and unwanted functional groups.

References:

Gasteiger partial charges

Visualizes atomic charges in a molecule.

Gasteiger Partial Charges

References:

Murcko scaffolds

Converts a column with molecules to Murcko scaffolds.

Murcko Scaffolds

References:

Mutate

Mutate molecules using different mechanisms:

  • Adding atoms
  • Adding bonds
  • Removing bonds

Mutations can be randomized using randomize flag. Mutation mechanisms and place will be in randomized for each mutation step.

References:

Reactions

Reaction template is in SMARTS format. Reactants can be combined from two sets, or sequentially depending on the matrixExpansion flag.

Reactions

References:

Similarity maps using fingerprints

Visualizes the atomic contributions to the similarity between a molecule and a reference molecule.

Similarity Maps Using Fingerprints

References:

Solubility prediction

The H2O modeling engine was used to train the model using the "Solubility Train" dataset
(#{x.Demo:SolubilityTrain."Solubility Train"}). The modelling method used was "Generalized Linear Modeling".

Molecular descriptors used in the model:

  • MolWt: Molecular weight
  • Ipc: The information content of the coefficients of the characteristic polynomial of the adjacency matrix of a hydrogen-suppressed graph of a molecule
  • TPSA: Total polar surface area
  • LabuteASA: Labute's approximate surface area
  • NumHDonors: Number of hydrogen donors
  • NumHAcceptors: Number of hydrogen acceptors
  • MolLogP: Wildman-Crippen LogP value
  • HeavyAtomCount: Number of heavy atoms
  • NumRotatableBonds: Number of rotatable bonds
  • RingCount: Number of rings
  • NumValenceElectrons: Number of valence electrons

References:

USRCAT

USRCAT is an extension of the Ultrafast Shape Recognition (USR) algorithm, which is used for molecular shape-based virtual screening to discover new chemical scaffolds in compound libraries. USRCAT incorporates pharmacophoric information in addition to molecular shape, which enables it to distinguish between compounds with similar shapes but distinct pharmacophoric features.

USRCAT

References: