Skip to main content

Manage Conda environments

Specify environment

Set an environment parameter of the script to a one-liner YAML following the standard Conda YAML config (omit it's name).

For example, we need to use the following Conda config:

name: envtest01
channels:
- Conda-forge
dependencies:
- python=3.8
- glom
- pip:
- requests

To use it in a script, specify it as follows:

#name: EnvTestInline
#environment: channels: [Conda-forge], dependencies: [python=3.8, glom, {pip: [requests]}]
#language: python
#output: string result

import re, requests
from glom import glom
import pandas as pd

target = {'a': {'b': {'c': 'd'}}}
result = glom(target, 'a.b.c') # returns 'd'
First launch may take time

When the script runs the first time, Datagrok creates the environment on the Compute Virtual Machine, which may take up to several minutes. For all next script runs, Datagrok will reuse this environment.

Datagrok distinguishes in-place environments using MD5 hashes of their body strings. If there is a ready-to-use environment with the same environment config, Datagrok will reuse it.

Store the environment in a package

You may store your configurations in a package's environment folder. They should use Conda YAML format. You can create an environment YAML manually or export it form your existing Conda environment.

If the environment tag in the script header is not specified, the script uses the default configuration.

tip

Here is an example of configuration for the Datagrok public repository. Also, a package can define its own configurations as well (see examples).

This is how to define the "Chemprop" environment in the script header:

#environment: Chemprop

In this case, the environment Chemprop should be specified in a file environments/Chemprop.yaml inside the package where this script belongs.

Datagrok distinguishes package environments by their names. Datagrok will reuse a previously created environment for all subsequent runs with no delay.