Complex data types

Working with dataframes

For table data, Datagrok supports Dataframe as input/output parameters. You can pass to the script the whole dataframe (dataframe type), dataframe column (column), or list of columns (column_list).

A simple example of a dataframe usage is available on the Getting started with scripting page.

Let's modify the default example to accept and return both dataframes and scalars. We copy the original dataframe and add a new ID column to it. Also, we return the number of rows in the table.

Result
Python
R
JavaScript

When you run the script, you will see the following dialog:

dataframe-id-demo

Datagrok created the script UI, populated default values, and created popups with help text.

#name: DataframeIdDemo
#description: Adding ID column to a dataframe
#language: python
#tags: demo, dataframe
#input: dataframe table [Data table]
#input: string id_column = 'ID' [Name of ID column]
#input: string id_prefix = 'id_' [Prefix for ID column]
#output: dataframe new_table [New table with additional column]
#output: int last_row [number of last row]

new_table = table.copy()
l = len(new_table)
new_table[id_column] = [f"{id_prefix}{n:04}" for n in range(l)]
last_row = len(new_table)

#name: DataframeDemo
#description: Adding a new column to a dataframe
#language: r
#tags: demo, dataframe
#input: dataframe table [Data table]
#input: string id_column = 'ID' [Name of ID column]
#input: string id_prefix = 'id_' [Prefix for ID column]
#output: dataframe new_table [New table with additional column]
#output: int last_row [number of last row]

new_table <- table
new_table[id_column] <- paste0(id_prefix, 1:nrow(new_table))
last_row <- nrow(new_table)

//name: DataframeIdDemo
//description: Adding ID column to a dataframe
//language: javascript
//tags: demo, dataframe
//sample: cars.csv
//input: dataframe table [Data table]
//input: string id_column = 'model' [Name of ID column]
//input: string id_prefix = 'id_' [Prefix for ID column]
//output: dataframe new_table [New table with additional column]
//output: int last_row [number of last row]

const new_table = table.clone();
const last_row = new_table.rowCount;
new_table.col(id_column).init((i) => `${id_prefix}${i}`);

After running this script, Datagrok automatically opens the new dataframe. It will contain an additional column ID with the generated row ID.

note

Datagrok's dataframe converts to:

Pandas dataframe for Python,
Native data frames for R,
Cell arrays for Octave
DataFrame for Julia
DG.DataFrame for JavaScript

Case-insensitive column names

In Datagrok, unlike Python/R dataframes, column names are case-insensitive. If you return a dataframe with columns whose names differ only by letter case, Datagrok will automatically add a number to the column header to differentiate them.

To prevent any confusion or issues with column names, we recommend using unique names that are distinct regardless of case.

Column inputs for dataframe

Datagrok provides you with data inputs to select one or multiple columns from a dataframe.

The column input parameter allows you to select one column from the dataframe. In the script, the column parameter is a string variable containing header name of the selected column.
The column_list input parameter allows you to select multiple columns from the dataframe. In the script, the column_list parameter is a list of strings containing header names of the selected columns.

Both of these selectors require at least one Dataframe input to choose a dataframe.

Result
Python
JavaScript

ColumnSelectorDem

#name: ColumnSelectorDemo
#description: Using column selectors
#language: python
#tags: demo, dataframe, column_selector
#input: dataframe table [Data table]
#input: column id_column [Fill this column with auto-d=generated ID]
#input: column_list data_columns [Keep this column and drop all others]
#output: dataframe new_table [New table with additional column]

new_table = table.copy()
l = len(new_table)
new_table[id_column] = [f"id_{n:04}" for n in range(l)]
new_table = new_table[ [id_column] + data_columns ]

//name: ColumnSelectorDemo
//description: Using column selectors
//language: javascript
//tags: demo, dataframe, column_selector
//input: dataframe table [Data table]
//input: column id_column [Fill this column with auto-d=generated ID]
//input: column_list data_columns [Keep this column and drop all others]
//output: dataframe new_table [New table with additional column]

const new_table = table.clone();
const l = new_table.rowCount;
new_table.col('model').init((i) => `${id_column.get(i)}_${i}`);

Choices

Datagrok natively support the choices capability for primitive input types (usually strings). You can use it to pass to the script one value from a pre-populated list, or a list of selected values.

Single choice

To implement a single choice, specify the choices options in the paraemter annotaions. For example, let's implement a very sinmple calculator that accepts two numbers and an operation to perform on them. The Datagrok automatically creates a dropdown list with available operations.

Result
Python
JavaScript

Single Choices example

#name: ChoiceDemoCalculator
#language: python
#input: double a = 2
#input: double b = 3
#input: string action = "+" {choices: ["+", "-", "*", "/"]}
#output: double c

if action == "+":
    c = a + b
elif action == "-":
    c = a - b
elif action ==  "*":
    c = a * b
else:
    c = a / b

//name: ChoiceDemoCalculatorJS
//language: javascript
//input: double a = 2
//input: double b = 3
//input: string action = "+" {choices: ["+", "-", "*", "/"]}
//output: double c

let c;  // result variable

if (action === '+') {
    c = a + b;
} else if (action === '-') {
    c = a - b;
} else if (action === '*') {
    c = a * b;
} else {
    c = a / b;
}

Multi-value choice

To implement a multi-choice input, annotate the input variable as list<type>, and specify the choices options in the parameter annotaions, same as above. To initialize the list of choices, you need to provide a list of values for the default variable value.

Limited support for scripts

Multi-value choices are now supported only in JavaScript. The multi-choice support for Python and R is planned for the next releases.

Result
JavaScript

Single Choices example

//name: PurificationLogFactor
//language: javascript
//input: list<string> methods = ["Ion chromatography", "Diafiltration" ] {caption: "Purification methods"; choices: ["Gel-filtration", "Ion chromatography", "Diafiltration", "Ultrafiltration"]}
//output: double log_factor_sum


const logFactors = {
  "Gel-filtration": 2.0,
  "Ion chromatography": 4.5,
  "Diafiltration": 6.4,
  "Ultrafiltration": 10.0,
};

log_factor_sum = methods.reduce((sum, method) => {
    const factor = logFactors[method];
    if (typeof factor !== "number") {
      throw new Error(`Unknown purification method: ${method}`);
    }
    return sum + factor;
  }, 0)

Dynamic choice selection

Instead of using a fixed list of values, you can define choices using a csv file, a name of another function (such as query), or by writing an SQL query.

//input: string shipCountry = "France" {choices: OpenFile("System:AppData/Samples/countries.csv")}
//input: string shipCountry = "France" {choices: Samples:countries}
//input: list<string> company {choices: Query("SELECT DISTINCT company from research_companies")}

Some more examples are provided on the Function annotations page.

File I/O

You can use files as input or output parameters using file and blob annotations.

The file parameter type allows you to read and write a file. Inside the Python/R script this parameter will be a string variable containing a path to the local file.

Reading files

When you use the file annotation for the input parameter, Datagrok creates an interface to load the file.

Result
Python
JavaScript

Scripting-FileIO-load

#name: DfFromJSON
#description: Loads a dataframe from JSON file
#language: python
#tags: template, demo, FileIo
#input: file json_file {caption:JSON file} [A JSON file to load a dataframe]
#output: dataframe df 

df = pd.read_json(json_file)

//name: DfFromJSON
//description: Loads a file and returns first sheet name
//language: javascript
//tags: template, demo, FileIo
//input: file uploadedFile {caption:Excel file}
//output: string first_sheet_name 

const importWb = new ExcelJS.Workbook();
await importWb.xlsx.load(uploadedFile.data);

first_sheet_name = importWb.worksheets[0].name;

You can upload the file from your computer, choose it from Datagrok file storage, or use any of the file connectors supported by Datagrok.

Scripting-FileIO-connectors

The blob input works in a very similar way but provides the binary stream instead of a file name.

#name: BlobTest
#description: Example of Blob usage
#language: python
#tags: template, demo
#input: blob array_blob 
#output: string typeofblob

typeofblob = type(array_blob)

You can use this capability to effectively transfer a large set of data from one Datagrok function/script to another.

Saving data to files

You can use both file and blob annotations. for output files. For example, let's save a dataframe to a JSON file:

Result
Python

FileIO-Save

#name: DfToJSON
#description: Saves a dataframe to JSON file
#language: python
#tags: template, demo, FileIo
#input: dataframe df [Dataframe to convert to JSON]
#output: file json_file

df.to_json(json_file)

When you run this script, Datagrok will return the FileInfo object in the scalar variables panel. To save the file, right-click on the highlighted file link and choose the Download option. The file name always matches the output variable name.

info

Some Python functions (for example, Numpy Save) automatically add an extension to the file name if it is provided without an extension. In this case, Datagrok won't be able to locate output file, and you'll see empty file in the Datagrok output.

To override this behavior, you can open the file via the Python open function, and use the file object instead of the file name.

#output: file array_file_binary
...
with open(array_file_binary, 'wb') as npfile:
 np.save(npfile, array)
...

Return graphical objects

Datagrok supports the special data type to transfer graphical data.

For Python, this variable can contain any graph created by the matplotlib library (the matplotlib.figure.Fugure class). When you run the script manually, Datagrok captures the graphics object and creates a separate tab to view the results. Datagrok can also save the graphics output in a dataframe, or display it in the cell properties.

Datagrok Viewers

Out-of-the-box Datagrok contains many blaze-fast flexible interactive viewers, suitable for almost all data visualization tasks.

We suggest you exploring it before using graphical libraries of your programming language.

Working with dataframes​

Column inputs for dataframe​

Choices​

Single choice​

Multi-value choice​

Dynamic choice selection​

File I/O​

Reading files​

Saving data to files​

Return graphical objects​