Chemprop
Property prediction plays a key role in drug discovery and materials science, helping researchers estimate characteristics like toxicity, bioactivity, and solubility.
We have now integrated Chemprop, a PyTorch-based framework, into our platform, enabling easy access with just a few clicks.
Model parameters
Parameter | Description | Default |
---|---|---|
dataset_type | Type of dataset (e.g., classification or regression). Determines the training loss function. | Regression |
metric | Evaluation metric. Does not impact training loss. Defaults to "AUC" for classification, "RMSE" for regression. | None |
multiclass_num_classes | Number of classes when running multiclass classification. | 3 |
num_folds | Number of folds when performing cross validation. | 1 |
data_seed | Seed for data splitting. For multiple folds, increments by 1 for each fold. | 0 |
split_sizes | Proportions for train/validation/test splits. | 0.8, 0.1, 0.1 |
split_type | Data splitting method (e.g., random, cross-validation). | Random |
activation | Activation function used in the model (e.g., ReLU, tanh). | ReLU |
atom_messages | Uses messages on atoms rather than bonds. | False |
message_bias | Adds bias to linear layers. | False |
ensemble_size | Number of models in the ensemble. | 1 |
message_hidden_dim | Hidden layer dimensionality in the message-passing network. | 300 |
depth | Number of message-passing steps. | 3 |
dropout | Dropout probability for training. | 0.0 |
undirected | Sums bond vectors to treat edges as undirected. | False |
ffn_hidden_dim | Hidden dimension size for the feed-forward network. | 300 |
ffn_num_layers | Number of layers in the feed-forward network after message-passing encoding. | 2 |
epochs | Total number of training epochs. | 50 |
batch_size | Batch size for training. | 64 |
warmup_epochs | Linear increase of learning rate from initial to max, then exponential decay to final. | 2.0 |
init_lr | Starting learning rate for training. | 0.0001 |
max_lr | Peak learning rate for training. | 0.001 |
final_lr | Ending learning rate after decay. | 0.0001 |
no_descriptor_scaling | Disables feature scaling for descriptors. | False |
Train
To train a model with Chemprop:
- Prepare dataset
- Ensure it includes a column with SMILES or molblocks.
- Set up the model
- Go to ML > Models > Train Model….
- Select dataset, specify the target column (values to predict), and the feature column (SMILES or molblocks).
- Choose Chemprop as the model engine
- From available engines, select Chemprop.
- Adjust model settings
- Configure training parameters (epochs, batch size, learning rate) and evaluation metrics as needed.
- Train model
- Click Train to start the training process and generate the model.
Predict
Once training is complete, follow these steps to apply a trained model for predictions:
- Select the dataset
- Choose the table containing the features for prediction.
- Apply the model
- Go to ML > Models > Apply Model.
- Select the trained model you wish to use from the available options.
- Generate predictions
- Click Apply Model to make predictions on the selected dataset.