Function annotations
There are various types of functions, such as scripts or queries. All of them are annotated in a similar way, and that enables universal support for functions in the platform.
A function annotation, also known as a header, is a multi-line comment that is placed above the function declaration. It contains information about the function name, role, as well as function inputs and outputs. Inputs and outputs can have metadata associated with them as well.
When you run a function, Datagrok generates the UI from the parameter annotations. You can build interactive experiences that allow validation, controlled vocabularies, autocomplete, custom inputs, dynamic lookups, referencing parameters, and function chaining - without writing a single line of UI code!
Function
There are general parameters common to all functions, as well as parameters specific to certain function types. Not all general parameters are required, the list of parameters depends on the function type, role, and so on.
Simple python script
#language: python
#name: GetCellNumber
#description: Calculates number of cells in the table
#tags: demo
#input: dataframe table [Data table]
#output: int count [Number of cells in table]
count = table.shape[0] * table.shape[1]
These are the common parameters for all functions:
name
: shows up in the user interfacedescription
: shows up in a function tooltiptags
: comma-separated tags that you can use in searchhelp-url
: help that shows up when you click on the "?" in the function dialogreference
: Reference to a research paper, Wikipedia article, Git repository, etc.top-menu
: Top menu path separated with pipes (|
), such asChem | Gasteiger Charges
Some parameters are specific to script language and/or technology:
- Script
language
: script language (supported:r
,python
,octave
,julia
,grok
,javascript
)environment
: script environment (Conda environment for python, etc)sample
: path to a sample data csv file. When defined, a*
icon appears on the ribbon panel that loads it.
- Script-based info panels
condition
: GrokScript condition that gets evaluated to decide whether to show the panel for the object
- Query
connection
: Name of the db connection that the query should use
To add additional parameters, use the meta.
prefix. They can be used for dynamically searching for
the functions of interest.
Inputs and outputs
Each input and output definition take one line that starts with the comment, followed by type, name, optional default value, options, and optional description, just like here:
#input: string country {choices: ["USA", "Canada"]} [Country to ship from]
#input: double weight
#output: double price
There is a crucial difference between annotation of scrips and Javascript functions in packages. When you annotate a script in any supported language, parameters are mapped by the parameter name. So the parameter name and value always are consistent.
When you annotate a Javascript function from a package, parameters are mapped by parameter order instead of name. Let's explore it on the following example:
//name: ParameterTestFunction
//description: Small function to illustrate parameter mapping
//language: javascript
//input: int one=1 {caption: First} [First parameter]
//input: int two=2 {caption: Second} [Second parameter]
export function ParameterTestFunctionPkg(two, one) {
const result = `First:${one}, second:${two}`;
console.log(result);
}
In the function signature, the parameter names go in the different order: (two, one)
,
comparing to the annotation.
As a result, the function will produce the unexpected result: First:2, second:1
To avoid mistakes, we suggest that you always use exactly the same parameter order in the function annotation and the function signature.
Parameter types and options
Datagrok supports the following types in all scripting languages:
- Scalars:
int
,double
,bool
,string
,datetime
- Table:
dataframe
,column
,column_list
- Collections:
list
(typically of strings) graphics
: typically a function output. See examplefile
: when the script is executed, contains a string with the path to a fileblob
: array of bytes
Some of the options apply to all parameters, while other are type-specific.
Reference: list of options applicable to different parameters
For all parameters:
Option | Value | Description |
---|---|---|
validators | string | Comma-separated list of validators |
caption | string | Custom field caption |
postfix | string | Field postfix |
units | string | Value unit name |
nullable | bool | Makes it an optional parameter |
For dataframe
type:
Option | Value | Description |
---|---|---|
columns | numerical | Only numerical columns will be loaded |
columns | categorical | Only categorical columns will be loaded |
For column
and column_list
types
Option | Value | Description |
---|---|---|
type | numerical,categorical,dateTime | In a dialog, only numerical columns will be shown |
format | MM/dd/yyyy | Datetime format, for dateTime columns and datetime type only |
allowNulls | true/false | Adds validation of the missing values presence |
action | join("table parameter name") | Joins result to the specified table, for output parameters only |
action | replace("table parameter name") | Replaces result with columns in specified table, for output parameters only |
For string
type
Option | Value | Description |
---|---|---|
choices | Comma-separated list of values, or a function name that returns a list of strings | Makes it a combo box |
suggestions | Name of a function that returns a list of strings to autocomplete user input | Autocomplete gives you options as you type |
For numeric
types
Option | Description |
---|---|
min, max | Minimum and maximum to be validated. When both are defined, slider is added for the float input, and +/- clicker is added to the int input |
For list
type`
Option | Value |
---|---|
separators | Characters used to split string to list of values. Default is , |
Separators apply only for the TextArea input type. The following example demonstrates how separators work for the Postgres-based SQL query:
Example: Using separators in a query
--name: OrdersByEmployee
--friendlyName: OrdersByEmployee
--connection: PostgresNorthwind
--input: string shipCountry = "Spain" {choices: Query("SELECT DISTINCT shipCountry FROM orders")}
--input: string shipCity = "Barcelona" {choices: Query("SELECT DISTINCT shipcity FROM orders WHERE shipCountry = @shipCountry")}
--input: string customerId = "GALED" {choices: Query("SELECT DISTINCT customerid FROM orders WHERE shipCity = @shipCity")}
--input: list<string> employee {inputType: TextArea; separators: ,}
SELECT *
FROM orders
INNER JOIN employees
ON orders.employeeId = employees.employeeId
WHERE lastName in (SELECT unnest(@employee))
Initial values and optional parameters
Proper handling of the empty parameters requires special efforts when building a SQL query, and passing empty parameters to the function that does not expect it is a major source of errors. To deal with it, by default each parameter is required, but you can specify initial values and make it optional (nullable).
- Initial value gets shown in the dialog when you execute the function. If you remove the value, an empty value is passed to the function. But before the function is executed, the input is validated, and you will get an error if a required parameter is not specified.
- Optional parameter: make a parameter optional by specifying the
nullable: true
option.
For example, to create an optional string parameter with the initial value:
--input: string shipCountry = "France" { nullable: true }
SELECT * FROM customers where shipCountry = @shipCountry
Filter patterns
Filter pattern allows you to use free-text conditions like "this week" for dates, or ">50" for numbers.
To use search patterns, set the input type to string
, and set the pattern
option
to the type of the column you are filtering. Then, reference the pattern in the query
as @patternName(columnName)
, just like we did here for the "freight" column:
--input: string freightValue = >= 10.0 {pattern: double}
select * from Orders where @freightValue(freight)
Different inputs would produce differently structured SQL (also dependent on the database).
Input | SQL | Description |
---|---|---|
select * from orders where 1 = 1 | No input => no filter | |
>3 | select * from orders where freight > 3 | Using column name to filter |
10-20 | select * from orders where (freight >= 10 and freight <= 20) | Have to do multiple comparisons |
In this example, the freightValue
input parameter is defined as a string with a default value of >= 10.0
.
The pattern
option specifies that the actual data type is a double
. In the query, a reference to
@freightValue(freight)
specifies the pattern that will be evaluated against the "freight" column.
Reference: Supported search patterns
Type | Value | Description or example |
---|---|---|
num, int, double | = | = 100 |
> | > 1.02 | |
>= | >= 4.1 | |
< | < 5 | |
<= | <= 2 | |
in | in (1, 3, 10.2) | |
min-max | Range: 1.5-10.0 | |
string | contains | contains ea |
starts with | starts with R | |
ends with | ends with w | |
regex | regex 1(\w+)1 | |
in | in (ab, "c d", "e\\"f\\"") | |
datetime | anytime | |
today | ||
this week | ||
this month | ||
this year | ||
yesterday | ||
last week | ||
last month | ||
last year | ||
before | before July 1984 | |
after | after March 2001 | |
min-max | Range: 1941-1945 |
To learn more, see search patterns.
Choices
Use choices
to make input a combo box, and restrict the selection to the defined set of values.
You can define choices
using a comma-separated list of values, a name of another function (such as query),
or by writing an actual SQL query.
Example: SQL query: single choice: different ways to specify a list of countries
--input: string shipCountry = "France" {choices: ['France', 'Italy', 'Germany']}
--input: string shipCountry = "France" {choices: Samples:countries}
--input: string shipCountry = "France" {choices: Query("SELECT DISTINCT shipCountry FROM Orders")}
--input: string shipCountry = "France" {choices: OpenFile("System:AppData/Samples/countries.csv")}
Example: SQL query: multiple choice for the "list" input
When choices
is applied to the list
parameter, the input becomes a multiple choice, just like in
this example:
--input: list<string> company {choices: Query("SELECT DISTINCT company from research_companies")}
Validation
In addition to the built-in checks that the value is not missing, and that it satisfies min-max conditions, you can specify custom validation functions that get executed before calling the function to assure the validity of the parameters.
The easiest way is to define a validator
GrokScript expression that gets evaluated when input changes.
Result true
or null
means that the input is valid. false
or a string error message means that the input is invalid,
it gets highlighted and the validation message is shown in the tooltip. Note that the expression can depend not only on the
value of the parameter the expression applied to, but on other parameters as well.
Example: Inline validation dependent on the value of other parameters
//input: int foo = 5 { validator: bar > 3 }
//input: double bar = 2 { min: 0; max: 10 }
The second option involves using a custom validation function and referencing it. Usually it's a JavaScript function that gets executed right in the browser, but you can use other languages as well. A validation function accepts one parameter (a string that user enters), and returns null if the string is valid, or the reason for being invalid, otherwise.
Example: Functions as validators
The following example adds a "containsLettersOnly" function to the "col" parameter:
#input: string s {validators: ["containslettersonly"]}
#input: column col {validators: ["containsmissingvalues"]}
//name: jsVal1
//input: string s
//output: string valid
valid = input < 11 ? null : "Error val1";
#name: Numbers
#language: python
#input: int count1 {validators: ["jsval1"]
"Visible" and "enabled" expressions
You can control input's visibility and enabled state by specifying GrokScript expressions on the parameter level, similarly to validation. Note that the expression can depend not only on the value of the parameter the expression applied to, but on other parameters as well.
Example: input visibility and enabled state dependent on the value of other parameters
//input: string type = 'ICE' { choices: ['Electric', 'ICE'] }
//input: int cylinders = 4 { visible: type == 'ICE' }
//input: double tankVolume = 40 { visible: type == 'ICE'; units: liters }
//input: bool tankExtension = false { visible: type == 'ICE'; enabled: tankVolume > 50 }
//input: double batteryCapacity = 80 { visible: type == 'Electric'; units: kWh }
Lookup tables
Lookup tables let you initialize inputs with a set of pre-defined values. To use it, set
the choices
attribute to a function that returns a dataframe, and set propagateChoice
to "all".
The first column is used as a key in the combo box. When you select it, the input fields are initialized with the
values from the corresponding columns (input names and column names have to match).
Example: Lookup table
The following example lets you initialize multiple car's parameters based on the model that you select.
Note that here we use the OpenFile
function to read a dataframe from the CSV file on a file share; it would
work the same if we read it from the database using the Query
function.
//input: string model { choices: OpenFile("System:AppData/Compute/cars.csv"); propagateChoice: all }
//input: double mpg
//input: int cyl
//input: int disp
model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.90,2.620,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.90,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.320,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.440,17.02,0,0,3,2
Referencing other parameters
Parameter's choices, validators, and suggestions can depend on the value of another parameter.
This is useful when creating queries with hierarchical choices, where each subsequent parameter
depends on the previous one. To do this, reference the parameter in another parameter's annotation
using the @
symbol.
Example: SQL-based hierarchical query
Let's say we want to build a UI, where you first select a state and then choose a city from
the corresponding state. All we need to do is to reference the @state
parameter in the query
that retrieves a list of cities:
--input: string state {choices: Query("SELECT DISTINCT state FROM public.starbucks_us")}
--input: string city {choices: Query("SELECT DISTINCT city FROM public.starbucks_us WHERE state = @state")}
SELECT * FROM public.starbucks_us WHERE (city = @city)
Datagrok does the rest, and turns it into an interactive experience:
Autocomplete
Use autocomplete to help users enter a correct value. For instance, when entering a product name, it might make sense to dynamically query a database for values starting with the already entered text, and suggest to auto-complete the value.
Use the suggestions
option to enable autocomplete, and specify the name of a function that
accepts one string parameter, and returns a list of strings (or a dataframe with one string column)
to be used as suggestions as the user types the value.
Example: SQL-based autocomplete function
Here are two SQL functions (from the Chembl package), where the UI for the "Structures by Organism" query uses the "organismsAutocomplete" function to complete user input:
--name: organismsAutocomplete
--input: string sub
select distinct organism from target_dictionary
where organism ilike '%' || @sub || '%'
limit 50
--name: StructuresByOrganism
--input: string organism = "Shigella" {suggestions: Chembl:organismsAutocomplete}
SELECT md.chembl_id AS compound_chembl_id,
cs.canonical_smiles,
act.standard_type,
act.standard_value,
act.standard_units,
td.chembl_id AS target_chembl_id,
td.organism, td.pref_name
FROM target_dictionary td
JOIN assays a ON td.tid = a.tid
JOIN activities act ON a.assay_id = act.assay_id
JOIN molecule_dictionary md ON act.molregno = md.molregno
JOIN compound_structures cs ON md.molregno = cs.molregno
JOIN organism_class oc ON td.tax_id = oc.tax_id
AND td.organism = @organism
AND oc.L1 = 'Bacteria'
Function inputs
To reuse other "helper" functions along with their editors for your top-level function, specify
the helper function using the editor
option. In the following example, we have a parameter orders
that points to the Samples:OrdersByEmployee
function, so in the autogenerated UI, we see the inputs
of that function instead of the normal dataframe input. When the top-level function is executed,
orders
parameters gets evaluated to the results of the execution of the Samples:OrdersByEmployee
function with the specified parameters.
To get the internal parameters of the orders
function, you can specify them in the input using the
editorParam
option. In the following example, we have a parameter country
that points to the shipCountry
parameter of the orders
function. When the top-level function is executed, country
parameter gets evaluated
to the value of the orders.shipCountry
parameter. That way, you can reference the internal parameters of the
helper function and use them in the top-level function.
This powerful technique allows to reuse functions, and mix multiple technologies and languages within one script. You can get your data with a SQL query, run calculations in Python, and then visualize it interactively in Datagrok - all of that without writing a single line of UI code. To learn more, see Compute.
Example: JavaScript function that uses SQL query as a function input
//language: javascript
//input: dataframe orders {category: Data; editor: Samples:OrdersByEmployee}
//input: int factor = 2 {category: Computation}
//input: string country {category: Computation; editorParam: orders.shipCountry}
//output: string result
result = `${country} - ${orders.rowCount * factor}`;
Input types
Input fields such as text boxes or combo boxes get generated automatically based on
the property attributes. You can also explicitly set the inputType
option. Here, we
set it to Radio
to make the input appear as a radio button instead of the combo box:
//input: string fruit { choices: ["Apple", "Banana"], inputType: Radio }
Reference: Supported input types
Input types have to match the data types (input types in bold are the default ones that you do not have to specify):
Input type | Data types | Example / description |
---|---|---|
Int | int | { min: 0; max: 20; step: 4} |
BigInt | bigint | |
Float | double | { min: 0; max: 1; step: 0.03 } |
Bool | bool | |
Text | string | |
Date | datetime | |
MultiChoice | list | { choices: ["A", "B"] } |
List | list | { inputType: TextArea; separators: ,} |
Column | column | |
Slider | int, double | { min: 0; max: 20; step: 4} |
Color | string | |
Radio | string | {inputType: Radio; choices: ["A", "B"]} |
Molecule | string |
Details
Example: Using "Slider", "Color", and "Radio" input types
Check out interactive snippet for more input types.
For developers: DG.InputBase
Inputs for semantic types
Datagrok automatically detects semantic types, and you can also specify semantic types of input parameters. In this case, a corresponding input will be used, if it is defined.
Example: Automatically using the molecular sketcher for the "Molecule" semantic type
This is how an input field for the "Molecule" semantic type looks like. When you click a molecule, a molecule sketcher pops up.
--input: string substructure = 'c1ccccc1' {semType: Molecule}
Examples
TypeScript function
//name: Len
//description: Calculates the length of a string
//input: string s
//output: int n
export function getLength(s: string): number {
return s.length;
}
Python script
#name: Template
#description: Calculates number of cells in the table
#language: python
#tags: template, demo
#sample: cars.csv
#input: dataframe table [Data table]
#output: int count [Number of cells in table]
count = table.shape[0] * table.shape[1]
Query
--name: protein classification
--connection: chembl
select * from protein_classification;
--end
Complex annotation example
#input: dataframe t1 {columns:numerical} [first input data table]
#input: dataframe t2 {columns:numerical} [second input data table]
#input: column x {type:numerical; table:t1} [x axis column name]
#input: column y {type:numerical} [y axis column name]
#input: column date {type:datetime; format:mm/dd/yyyy} [date column name]
#input: column_list numdata {type:numerical; table:t1} [numerical columns names]
#input: int numcomp = 2 {range:2-7} [number of components]
#input: bool center = true [number of components]
#input: string type = high {choices: ["high", "low"]} [type of filter]
#output: dataframe result {action:join(t1)} [pca components]
#output: graphics scatter [scatter plot]
See also: