Skip to main content

Complex data types

Working with dataframes

For table data, Datagrok supports Dataframe as input/output parameters. You can pass to the script the whole dataframe (dataframe type), dataframe column (column), or list of columns (column_list).

A simple example of a dataframe usage is available on the Getting started with scripting page.

Let's modify the default example to accept and return both dataframes and scalars. We copy the original dataframe and add a new ID column to it. Also, we return the number of rows in the table.

When you run the script, you will see the following dialog:

dataframe-id-demo

Datagrok created the script UI, populated default values, and created popups with help text.

After running this script, Datagrok automatically opens the new dataframe. It will contain an additional column ID with the generated row ID.

note

Datagrok's dataframe converts to:

Case-insensitive column names

In Datagrok, unlike Python/R dataframes, column names are case-insensitive. If you return a dataframe with columns whose names differ only by letter case, Datagrok will automatically add a number to the column header to differentiate them.

To prevent any confusion or issues with column names, we recommend using unique names that are distinct regardless of case.

Column inputs for dataframe

Datagrok provides you with data inputs to select one or multiple columns from a dataframe.

  • The column input parameter allows you to select one column from the dataframe. In the script, the column parameter is a string variable containing header name of the selected column.
  • The column_list input parameter allows you to select multiple columns from the dataframe. In the script, the column_list parameter is a list of strings containing header names of the selected columns.

Both of these selectors require at least one Dataframe input to choose a dataframe.

ColumnSelectorDem

File I/O

You can use files as input or output parameters using file and blob annotations.

The file parameter type allows you to read and write a file. Inside the Python/R script this parameter will be a string variable containing a path to the local file.

Reading files

When you use the file annotation for the input parameter, Datagrok creates an interface to load the file.

Scripting-FileIO-load

You can upload the file from your computer, choose it from Datagrok file storage, or use any of the file connectors supported by Datagrok.

Scripting-FileIO-connectors

The blob input works in a very similar way but provides the binary stream instead of a file name.

#name: BlobTest
#description: Example of Blob usage
#language: python
#tags: template, demo
#input: blob array_blob
#output: string typeofblob

typeofblob = type(array_blob)

You can use this capability to effectively transfer a large set of data from one Datagrok function/script to another.

Saving data to files

You can use both file and blob annotations. for output files. For example, let's save a dataframe to a JSON file:

FileIO-Save

When you run this script, Datagrok will return the FileInfo object in the scalar variables panel. To save the file, right-click on the highlighted file link and choose the Download option. The file name always matches the output variable name.

info

Some Python functions (for example, Numpy Save) automatically add an extension to the file name if it is provided without an extension. In this case, Datagrok won't be able to locate output file, and you'll see empty file in the Datagrok output.

To override this behavior, you can open the file via the Python open function, and use the file object instead of the file name.

#output: file array_file_binary
...
with open(array_file_binary, 'wb') as npfile:
np.save(npfile, array)
...

Return graphical objects

Datagrok supports the special data type to transfer graphical data.

For Python, this variable can contain any graph created by the matplotlib library (the matplotlib.figure.Fugure class). When you run the script manually, Datagrok captures the graphics object and creates a separate tab to view the results. Datagrok can also save the graphics output in a dataframe, or display it in the cell properties.

Datagrok Viewers

Out-of-the-box Datagrok contains many blaze-fast flexible interactive viewers, suitable for almost all data visualization tasks.

We suggest you exploring it before using graphical libraries of your programming language.