Skip to main content

Interactive modeling

The predictive modeling toolkit allows an interactive visualization tool for the models.

Many classical ML models can be trained quickly. In these cases, Datagrok creates a UI displaying all model parameters.

Such approach is particulary useful for quick prototyping and data analysis. Datagrok automatically analyses data and predictions to provide user with the most relevant models and insights. So, all the machine learning is done automatically. User then has all the tools to analyze the results and continue exploring problem with the guide of the platform.

Workflow

Model setup begins with data configuration. First, we select a table to work with, then we have to choose a target column and feature columns.

Selecting data establishes the predictive problem. Next, configure the model engine (e.g., its source or architecture) and hyperparameters. Datagrok dynamically updates the model as parameters are adjusted. This makes Datagrok an interactive playground for modeling.

Model autoselection and autoconfiguration

Based on the data structure, Datagrok suggests suitable models. Datagrok provides a list of models that match the data and selects the best one.

Tips as you go

The platform suggests data transformations, model configuration changes, or insights based on the situation.

For example, Datagrok may:

  • Show correlated columns
  • Suggest handling of missing values
  • Analyses false positive and false negative errors for binary classification problem.
Class imbalanceIgnore missing valuesLow precision

Model comparison

Datagrok allows users to play with models by changing parameters on the go. To preserve effective configurations, models can be saved to the comparison tool.

The comparison tool stores model parameters for future reference. This ensures users can identify and save only the best configurations.

Visualizations

Datagrok offers a wide range of context-aware visualizations. In tables, $X$ represents feature columns and $y$ denotes the target column.

VisualizationDescriptionShowed forExampleRead more
Scatter Plot (regression)Visualizes predicted vs actual values.Regression problems ($y$ is numerical)
Residuals plotThe difference between prediction and the real value.Regression problems ($y$ is numerical)Errors and residuals
Scatter plot (classification)Visualizes predicted vs actual values using colors for classes.Classification problems ($y$ is categorical)
Roc CurveTrade-offs between true/false positive rates for classifiers.Binary classification problems ($y$ is categorical with 2 classes)ROC curve
PC PlotParallel coordinates plot.$X$ has more than 2 and less than 11 numerical columns
StatisticsStatistics of the $y$ columnRegression problems ($y$ is numerical)
DistributionDistribution of the predicted values, and distribution of values in $y$.Regression problems ($y$ is numerical)
Confusion matrixRows represent actual classes, columns represent predicted classes, and relevant metrics are displayed.Classification problems ($y$ is categorical)Confusion matrix
Correlation plotShows feature interdependencies$X$ has more than 10 numerical columns
Performance metricsAggregated performance statistics (e.g., R², accuracy).Always
Wrong predictionsHighlights incorrectly classified instances of the data.Classification problems ($y$ is categorical)
Custom visualizationsPackage-defined visualizations for specific models.Specific models--

Custom visualizations

Custom viewers can be defined for specific models using the JS API. It requires a setup by the rules to similar described in custom models guide.


//name: visualize
//meta.mlname: $MODEL_NAME
//meta.mlrole: visualize
//input: dataframe df
//input: column targetColumn
//input: column predictColumn
//input: dynamic model
//output: dynamic widget
export async function visualize(df: DG.DataFrame, targetColumn: DG.Column, predictColumn: DG.Column, model: any): Promise<any> {
let view : DG.JSViewer = new DG.JSViewer();
return view.root;
}

These viewers are automatically added to the interactive training view.