Cheminformatics
These programming exercises are designed to introduce developers to the Datagrok platform cheminformatics capabilities. The exercises are based on your knowledge obtained in exercises.
Table of contents
Basic exercises in cheminformatics
Exercise 1: Search for most common structures
You will learn: How to employ functions from external packages in your own package.
Prerequisites: "Cheminformatics".
Statement of the problem. Write a function that reads a file containing SMILES, determines the associated maximal common substructure (MCS), and computes the mutual similarity scores for molecules and the MCS.
Input data. Files > App Data > Chem > sars_small.csv
Solution, step-by-step.
-
Let's call our function
findSimilarToMCS
, we place its definition in./src/package.ts
inside our package. This function takes a single input — a dataframedf
. For the sake of simplicity, we suppose that the column with SMILES isdf.col('smiles')
://name: findSimilarToMCS
//input: dataframe df
export async function findSimilarToMCS(df: DG.DataFrame) : Promise<void> {
... // your code goes here
} -
Employ the asynchronous function
FindMCS
fromChem
package. Since we're calling a function from an external package, we should usegrok.functions.call
:const mcsValue = await grok.functions.call('Chem:FindMCS', {'smiles': 'smiles', 'df': dataframe, 'returnSmarts':
false}); -
Having obtained the string
mcsValue
, create a new column indf
, whose cells are filled with this value:- Create an
Array
of the appropriate length, filled withmcsValue
. - Feed this array to the constructor
DG.Column.fromList()
to get the desiredmcsCol
object. - Assign semantic type
Molecule
to the newly created column, with the help ofcol.semType(...)
. Similarly, associateMolecule
cell renderer with the help ofcol.setTag(...)
method.
- Create an
-
To compute similarity scores, we can call the
getSimilarities()
function ofChem
package, which takes as its parameters the initial SMILES column andmcsValue
. The function can be invoked as described in step 2. -
The output of step 4 is a new dataframe
scoresDf
, its 0-th column contains the scores values. This column,scoresCol
, can be reached with by means ofbyIndex()
method ofscoresDf.columns
object. -
Finally, insert the columns
mcsCol
andscoresCol
into the dataframe, next to the position of the initial SMILES column.df.columns.insert()
method can help with this, if we cleverly specify the index/position at which the insertion should take place.
Exercise 3: Train Model to Predict Activity
You will learn: How to train a model inside a package and use it to predict the activity of molecules
Prerequisites: "Molecular fingerprints", "Cheminformatics".
-
Create a package with the name
<yourFirstName>-cheminformatics
-
Add new function
// name: TrainAndPredict
//input: dataframe train
//input: dataframe test
export function TrainAndPredict(train, test) {
// your code here
}Here the training and test dataframes are our datasets for training and prediction, respectively.
-
Using grok.chem.descriptors create fingerprint of all molecules.
-
Use grok.ml.trainModel your model (using fingerprint) to predict activity of molecule. You can use dataset example
-
Using grok.ml.applyModel apply on the test and train datasets. Check the accuracy of the model.
-
Using grok.shell.addTableView(datasetName) output test dataset