Skip to main content


A next-generation environment for scientific computing that leverages core Datagrok features, such as in-memory data engine, interactive visualizations, data access, machine learning, and enterprise features to enable developing, publishing, discovering, and using scientific applications:

  1. Functions and cross-language support
  2. Scalable and reproducible computations
  3. Web-based UI that could be autogenerated, customized, used on mobile devices, and shared as URL
  4. Model lifecycle (create, deploy, define environments , version, share, use, validate, update)
  5. Data access
  6. Metadata used by the model browser
  7. Integration options: REST API, JS API , embedding as iframe
  8. Leveraging the platform
  9. Logging, audit, and traceability
  10. Privileges and visibility
  11. Usage analysis
  12. Exploratory data analysis and Jupyter notebooks
  13. Roadmap

Most of the foundational functionality is implemented in the Datagrok core. Compute-specific enhancement, different function views, and analytical blocks are part of the Compute package (see the planning board).

Functions and cross-language support

Data access, computations, and visualizations are the cornerstones of scientific computing. In Datagrok, all of them are functions with the following features:

  • Advanced support for input and output parameters
    • Typed (cross-language support for scalars, vectors, dataframes, images)
    • Introspectable
    • Metadata-annotated
  • Dynamic discovery
  • Polymorphic execution (platform doesn't care which language the function is implemented in)
  • Annotated with metadata both on the function and on the parameter level
  • Could be serialized and re-executed at a later point.

A function could be written in any language that Datagrok supports. Typically, computations are developed in Python, R, Julia, Matlab/Octave, JavaScript, or C++. Data access usually uses SQL, SPARQL, OpenAPI, or JavaScript.

Functions could either be registered manually, or published as part of a package, which usually is kept under the source control system. Once a package is published, its content is discoverable (subject to checking privileges).

This is an incredibly powerful concept that allows us to approach scientific computations in a novel way, and unlock plenty of interesting features covered below, such as scalable computations, [reproducibility], automatic UI generation, audit, [sensitivity analysis], different analytical blocks applicable to any function, [optimization], and others.

Scalable computations

Depending on the underlying language, a function could be executed on the client side, server side, both, or either.

JavaScript and C++ (compiled to WebAssembly) could be executed right in the browser. The upside to that is unmatched responsiveness, data locality, and computation locality. The downside is that many of the popular statistical and modeling methods are not currently available in these languages yet. Note that while the computations are performed locally, the proper audit and traceability still works (both input and output parameters could be sent to the server for historic reasons).

R, Python, Julia, Matlab, and Octave are powerful languages with the mature ecosystem of scientific libraries, and existing models implemented previously inside the organization. They could only be executed on a server, and as such the question of scalable computation arises. Datagrok takes care of that by using the message queue architecture. When each server-based function is invoked, its parameters are saved to a queue; one of the worker processes then picks a task (such as running a Python function), executes it, and puts the results back. This architecture guarantees the following:

  • The platform won't get overloaded by trying to execute too many tasks at once
  • Scaling is as simple as adding more workers (which could be hosted externally if necessary)
  • A queue serves as a basis for logging, audit, and traceability

User interface

Our goal is not computations for the sake of it, but rather helping users derive actionable insights, and support the decision-making process. The UI should be as easy to use as possible, tailored to the user needs, and be specific for the tasks. On the other hand, it should be clean, universal, and easy enough to be developed by a scientist without a deep understanding of the Datagrok platform. To satisfy these seemingly contradictory requirements, we developed a hybrid approach to building the UI, where the model author has full control over choosing how custom the UI for the specific model should be. In the most standard case, there is no need to write a single line of code, as the UI is automatically generated based on the function signature. On the other end of the spectrum, you have the possibility to take everything in your own hands and develop a completely custom UI. Anything in between is also possible.

Autogenerated UI

Very often, all that is needed for the model UI are the input fields for the corresponding function's parameters. In this case, Datagrok generates the UI automatically by constructing the corresponding input fields and output area with graphics and results, and bringing it to life by making it interactive. Additional parameters' metadata, such as units, category, description, input type (slider/combo box/etc), and others are also taken into account.

The following picture demonstrates a working PK/PD model implemented in R with the autogenerated UI (look at the script header area for details). While it looks very similar to the traditional Shiny app, the R script does not have to deal with the UI at all, which not only simplifies the development and maintenance, but also provides for the uniform experience.


A visual tool for annotating function parameters is currently in the development.

Learn more: function parameters.

See also: auto-generating UI for dynamic data retrieval.

Custom UI

On the other side of the spectrum, if necessary the UI could be developed from scratch without any limitations, using either vanilla JavaScript, a framework of your choice such as React, or Datagrok UI toolkit. No matter what you choose, Datagrok JS API could always be used. For convenience, a repository of commonly used UI templates is provided.

Mobile devices

Datagrok UI is web-native, so it is possible to use the platform on mobile devices, including performing computations on the client-side. Even in the current state with no mobile-specific UI optimizations performed, the platform is already usable. This allows for a $100 tablet to be duct-taped on the instrument in the lab and run a simulation specific to that instrument - literally a fit-for-purpose solution!

Here's Andrew running the client-side Lotka-Volterra Model on the underpowered Nexus 7 from 2012:


Model lifecycle


Typically, there would be three parts of a model: data access, computations, and visualization.

The computational part of the model is a regular pure function (no UI, no data access) written in R, Python, Matlab or any other language supported by Datagrok. It accepts parameters of the certain types as input and output. No matter which dashboarding technology you use, untangling the computations from data access and UI is a good idea anyway - not only it makes your code cleaner, but also allows to reuse the logic. Even if you choose some other technology later, the effort won't be lost.

Once the computation code is developed (using your favorite coding environment), it's time to annotate the function and its parameters with the metadata. This is what it looks like for the PKPD R-based model:

#name: pkpd
#language: r
#tags: model
#meta.domain: PKPD
#input: double dosage = 1000 {category: Dosing options}
#input: string compartments {category: PK model; choices: ['2 compartment PK', '1 compartment PK']}
#input: double clearance = 2 {category: PK parameters}
#input: double interRate = 1 {category: PK parameters} [intercompartmental rate]
#input: double effRate = 0.2 {category: PD parameters} [effective compartment rate]
#input: double effect = 8 {category: PD parameters} [EC50]
#output: graphics PKPD
#output: double Cmax {units: nM}


After that, let's proceed to deploying this model.


In the simplest case, deploying a model is saving a script with the #model tag - the platform takes care of the rest. It could be done either manually via the UI, or automatically

  • Manual deployment: choose Functions | Scripts | New R script, paste the script in the editor area, and hit SAVE.
  • Automatic deployment: save model as part of the package, and publish it

Together with the script versioning and script environments features outlined below, this enables reproducibility of results.


As most Datagrok objects, models are versionable, meaning that all the sources for the previously used versions are available, along with the audit trail of the changes. The current and all previously published versions are stored in the Datagrok metadata database.

Additionally, a source control system such as Git (or BitBucket) could be used as a source for publishing the packages. It is a good idea to use source control anyway, and Datagrok allows to publish packages that contain models directly from it.


Scripts could specify the required environment, such as libraries used, their versions, versions of the language interpreter, etc. We use Conda environments for Python, and Renv environments for R.


There are two aspects to sharing - enabling access via privileges, and providing a link to the model.

By default, a freshly onboarded model is accessible only to the author. To share it with others, use the built-in sharing mechanism. If a model is part of the package, you can set the desired audience there as well.

Providing a link is easy - each model could be shared via the URL. A model execution with the specific input parameters could also be shared as URL ( example:

Data access

The platform allows to seamlessly access any machine-readable data source, such as databases, web services, files (either on network shares on in S3). To make a model retrieve the input data from the data source, annotate the input parameter with the corresponding parameterized query. Since both queries and models are functions, the platform can automatically generate the UI that would contain both input- and computation-specific parts.

By untangling the computation from the data access, implementing both of them as pure functions, and eliminating the hardcoded UI altogether, we can now create powerful, interactive scientific application without having to write a single line of the UI code. These applications also automatically benefit from all other cross-cutting features.

The following examples illustrates it. Suppose we want to develop an R-based simulation against the freshest data from the database. This would require two steps: creating a parameterized query, and creating a computation script. Here are the query, the computation, and the automatically generated end result:


Note that the Powder and Metal inputs above have lists of allowed values that were retrieved dynamically by executing the specified PowderNames and Metals queries. If these queries slow the UI down, consider caching the results.

Parameterized queries work via Datagrok's data access mechanism, allowing you to benefit from other access-related features:

Reproducible computations

Detaching the computations from the data access, having versionable functions (both accessors and computations), and the ability to persist snapshots of input and output parameters allows us to do any of the following:

  • Run the model against the latest data
  • See historical data (both inputs and outputs)
  • Correlate historical results against changes in data accessors and computations
  • Analyze longitudinal changes of the model output


As with any other objects in Datagrok, models could be annotated with tags (single word such as #chem) and parameters (key-value pairs). Parameters could also be combined in schemas. This helps keep things organized and discoverable. Model browser makes heavy use of this feature.

Tags and parameters could either be edited manually in the model's context panel, or specified along with the model body. Here is the corresponding section from the Lotka-Volterra model (full code here):

//name: Lotka-Volterra
//tags: model, simulation
//meta.domain: Nonlinear dynamics

Model browser

Model browser helps you easily discover and execute models. Similarly to modern file explorers, models could be rendered either as a list, as a grid, or as tiles. On top, there is a free-text search field that allows you to search in the following modes:

  • by name (example: logistic)
  • by tag (example: #chem)
  • by meta parameter (example: domain=bio)
  • by attributes (examples: created > -4d, )

To open a model, double-click on it.


Analytical blocks

No matter which domain you are working with, which language your program in, or what type of model you build, quite often you need the same set of tools (including visual tools) to efficiently work with data. Naturally, it makes sense to implement these algorithms just once, and then use them everywhere. Here are some examples:

  1. Imputation of missing values
  2. Outlier detection
  3. Multivariate analysis
  4. Time series analysis
  5. Validators

The fact that the typical analysis is an introspectable workflow consisting of functions passing the data helps us deal with that in a declarative manner.


Datagrok was designed with the design goal to be as extensible and easy to integrate with as possible, so out-of-the-box, we get many platform integration capabilities such as authentication, data access, and many others. In addition to that, there are some capabilities specific to models: REST API and embedding as iframe.


Once registered, each function gets assigned a REST API endpoint that allows external code to execute it by passing the input parameters along with the authentication token. This allows instantaneous deployment of scientific methods for external consumption. To learn how to use it, click on the function and expand the REST pane on the context panel on the right. Both JavaScript and Curl samples are provided.



JavaScript-based apps have access to the whole Datagrok platform via the JS API.

Embedding as iframe

Sometimes, an app has to be included in the external web page. The simplest way to achieve it is via the iframe element. To embed a model, click on the hamburger icon in the top left corner and choose Embed... . You will see a dialog with the HTML code on top that you can use to embed this into an external site. Note that you can change parameter values right in the URL.

On the bottom, you see a preview of the result ()


Leveraging the platform

The computation engine utilizes the power of the Datagrok platform, which brings plenty of benefits:

  • Not having to reimplement the wheel
  • Users don't have to switch tools anymore

Logging, audit, and traceability

Out-of-the-box, the platform provides audit and logging capabilities, and when the model is deployed, we get the following automatically:

  • See who created, edited, deployed, and used the model
  • Analyze historical input and output parameters
  • See how long computations took (and correlate with input parameters if needed)

All function invocation-specific data resides in the Datagrok metadata db (Postgres) in a structured, machine-readable way. We can also tune what needs to be persisted and where on a per-model basis.


Privileges and visibility

Datagrok has a built-in role-based privileges system that is used to define who can see, execute, or edit models. The same mechanism is used for the data access control.

Exploratory data analysis

Perhaps the most commonly used data structure in computing is dataframe. To analyze either input or output dataframe, click the `+`` ("Add to workspace") icon. This action opens the dataframe in the exploratory data analysis mode, allowing to visualize the data, transform or perform more in-depth exploration, such as multivariate analysis.

In the picture below, we are exploring the result of the model execution. While the default output is visualized via the line chart, once we add the dataframe to the workspace, we can explore it in different ways, such as visualizing it on a scatter plot, histogram, or correlation plot.



Open-source, curated repository of scientific methods

We will be creating and maintaining a repository of popular scientific methods available to everyone under the MIT license.

WebAssembly-compiled pure functions (implemented in C++ or Rust) will be of particular interest, since this technology unlocks efficient computations on either client or server sides (you can choose whether to move data to the algorithm, or vice versa).

Industry adoption

Datagrok already has plenty of unique features making it interesting for large biopharma companies that deal with complex scientific data and complex IT landscape. A novel scientific computation engine is a value multiplier. Many of the pure functions we have already developed in the areas of cheminformatics, bioinformatics, clinical data analysis, biosignals, NLP, and machine learning could easily be converted to interactive models. Open-sourcing commonly used models will help us with the adoption.

UI Designer

Visual dashboard designer. The idea is to be able to drag-and-drop model inputs and outputs into a design surface, where they would become inputs, plots, or widgets.

Unit tests for computations

It would be nice to declaratively specify for a computation function a set of input parameters together with the expected output parameters. This way, we would be able to automatically check the model for correctness each time it changes.

Input providers

Produce inputs to functions in-place as outputs of other functions (aka input providers), including:

  • queries to databases
  • dialog-based functions (outlier detection, data annotation)
  • queries to OpenAPI and REST endpoints
  • other computing functions with or without GUI

These may include UI parts as well. The input provider is specified as part of the Universal UI markup.

Compute Analytical blocks

Part of the Compute package:

  • Model browser
  • Outlier selector tool
  • Universal export tool
  • Step-by-step wizard for onboarding new models
  • Model renderers
  • Function views
    • Function parameter grid

Outlier detection

Automatic outliers detection Manual outliers markup and annotation Used as an input provider in other functions

Design of experiment

Sensitivity analysis

  • Sample inputs:
    • by specified number of samples
    • by a specified distribution or within a range
    • for a specified set of scalar inputs and/or columns of the matrix input
  • Produce variability analysis for outputs based on the sampled inputs
  • Visualize the results of analysis with Datagrok viewers

Input parameter optimization

Solve an inverse problem: find input conditions leading to specified output constraints.


  1. Persistent, shareable historical runs

    It is already possible to provide a link to a function (with specified input parameters in the URI), which will open a function view and run it.

    Once a certain version of a specific function is run with specific inputs, the result should be stored in the immutable database log along with the inputs. Later it will be used to verify the grounds for decisions made from these calculations.

  2. Scaling on demand

  3. Export and reporting

  4. Data annotation

  5. Test data for functions

  6. Functions versioning

  7. Audit