Skip to main content

Data connection

Data connection is used for accessing data in a particular data source. Connection parameters depend on the data source. Typically, you would need to provide server name and login credentials.

Connectors

A connector could work with a database, an Excel file, a CSV file, a web service, or basically anything that is capable of providing the data. We currently support over 20 different connectors, and the list is quickly growing. Most of our data connectors are open-sourced and could be found on GitHub (MIT license).

NameType
AccessJDBC
AthenaJDBC
BigQueryJDBC
CassandraJDBC
DB2JDBC
DenodoJDBC
DropBoxFiles
FilesFiles
FirebirdJDBC
GitFiles
GoogleCloudFiles
HBaseJDBC
HiveJDBC
Hive2JDBC
ImpalaJDBC
MariaDBJDBC
MongoDBJDBC
MS SQLJDBC
MySqlJDBC
Neo4jJDBC
OData
OracleJDBC
PostgresJDBC
RedshiftJDBC
S3Files
SnowflakeJDBC
Socrata
Sparql
SQLiteJDBC
TeradataJDBC
Twitter
VerticaJDBC
VirtuosoJDBC
Web

Creating a connection

To create a new data connection, open the "Databases" pane (Open | Databases), right-click on the appropriate connector in the tree, and choose "Add connection...". Alternatively, click on "New Connection" under the "Actions" tab, and select the appropriate connector.

Data Connection Tree

Editing properties

Then, edit the connection attributes, and click on TEST to confirm that you've entered everything correctly. The set of attributes you can edit depends on the connector. Typically, for JDBC-based connectors you can provide a custom connection string (but do not enter login and password there, they will still be picked up from the corresponding fields).

Create Data Connection

Once a connection is set up, you are ready to start creating queries. There are multiple ways to do so: manually or programmatically.

Note that the platform supports caching of results.

Credentials

Credentials required to access data sources should be defined in the setup connection. Read more about credentials here: Credentials.

Access control

Connections are first-class entities in the Datagrok platform, and as such are subjects to the standard checks and routines performed against them whenever they are used in the specific context. Some of the most popular privileges are: view, edit, delete, and share. Those privileges can be given to individual users, or to groups. For more information on the access privilege model, check out privileges.

Another “out of the box” feature that comes with connections being first-class entity is the audit trail for every action performed against the connection. For details on that, check out Audit page.

Caching

You can force the platform to cache results of executing queries (taking into account parameters as well). This is useful when a query executes slowly and returns relatively small results. A popular use case is caching the values that are used for building the UI automatically (typically this is some form of select distinct <name> from <table>) is a good idea.

You can turn caching on for either the whole connection, or for a particular query. For a connection, open its properties, and check the "Cache Results" checkbox. When you check it, the "Invalidate On" input becomes visible. Enter the cron expression there to define cache invalidation timepoints (for instance, if a database refreshes overnight and you want to invalidate it at 1am each night, enter 0 0 1 * * ?). Leaving the field blank will make the cache stay forever.

If a connection gets created automatically as part of the package, you can specify the cacheResults parameter in the connection json definition:

{
"name": "Northwind",
"parameters": {
"server": "dev.datagrok.ai",
"port": 23306,
"db": "Northwind",
"cacheSchema": false,
"cacheResults": true,
"ssl": false,
"connString": ""
}
}

To cache results of individual queries, edit the query (either via Datagrok UI if a query already exists, or by editing the corresponding .sql file of the package queries) and specify the meta.cache and meta.invalidate fields:

--name: getProductNames
--input: string department
--meta.cache: true
--meta.invalidate: 0 0 1 * * ?
select distinct name from products p
where p.department = @department

Filtering

You can use these fields to filter connections with smart search:

FieldDescription
ID
name
server
port
db
login
dataSource
description
createdOn
updatedOn
authorUser object
starredByUser object
commentedByUser object
usedByUser object

JDBC connection

For some cases connection may require custom JDBC connection string. For this case, JDBC-based data connection has parameter "Conn. string". If filled, it will be used for connection, and all other parameters will be ignored except for "Login" and "Password".

Videos

Data connection

See also: