Manage Conda environments
Specify environment
Set an environment
parameter of the script
to a one-liner YAML following the standard Conda YAML config (omit it's name).
For example, we need to use the following Conda config:
name: envtest01
channels:
- Conda-forge
dependencies:
- python=3.8
- glom
- pip:
- requests
To use it in a script, specify it as follows:
#name: EnvTestInline
#environment: channels: [Conda-forge], dependencies: [python=3.8, glom, {pip: [requests]}]
#language: python
#output: string result
import re, requests
from glom import glom
import pandas as pd
target = {'a': {'b': {'c': 'd'}}}
result = glom(target, 'a.b.c') # returns 'd'
When the script runs the first time, Datagrok creates the environment on the Compute Virtual Machine, which may take up to several minutes. For all next script runs, Datagrok will reuse this environment.
Datagrok distinguishes in-place environments using MD5 hashes of their body strings. If there is a ready-to-use environment with the same environment config, Datagrok will reuse it.
Store the environment in a package
You may store your configurations in a package's environment
folder. They should use Conda YAML format.
You can create an environment YAML
manually
or
export it
form your existing Conda environment.
If the environment
tag in the script header is not specified, the script uses
the default configuration.
Here is an example of configuration for the Datagrok public repository. Also, a package can define its own configurations as well (see examples).
This is how to define the "Chemprop" environment in the script header:
#environment: Chemprop
In this case, the environment Chemprop
should be specified in a file
environments/Chemprop.yaml
inside the package where this script belongs.
Datagrok distinguishes package environments by their names. Datagrok will reuse a previously created environment for all subsequent runs with no delay.