Data connection
Data connection is used for accessing data in a particular data source. Connection parameters depend on the data source. Typically, you would need to provide server name and login credentials.
Connectors
A connector could work with a database, an Excel file, a CSV file, a web service, or basically anything that is capable of providing the data. We currently support over 20 different connectors, and the list is quickly growing. Most of our data connectors are open-sourced and could be found on GitHub (MIT license).
Name | Type |
---|---|
Access | JDBC |
Athena | JDBC |
BigQuery | JDBC |
Cassandra | JDBC |
DB2 | JDBC |
Denodo | JDBC |
DropBox | Files |
Files | Files |
Firebird | JDBC |
Git | Files |
GoogleCloud | Files |
HBase | JDBC |
Hive | JDBC |
Hive2 | JDBC |
Impala | JDBC |
MariaDB | JDBC |
MongoDB | JDBC |
MS SQL | JDBC |
MySql | JDBC |
Neo4j | JDBC |
OData | |
Oracle | JDBC |
Postgres | JDBC |
Redshift | JDBC |
S3 | Files |
Snowflake | JDBC |
Socrata | |
Sparql | |
SQLite | JDBC |
Teradata | JDBC |
Vertica | JDBC |
Virtuoso | JDBC |
Web |
Creating a connection
To create a new data connection, open the "Databases" pane (Open | Databases), right-click on the appropriate connector in the tree, and choose "Add connection...". Alternatively, click on "New Connection" under the "Actions" tab, and select the appropriate connector.
Editing properties
Then, edit the connection attributes, and click on TEST to confirm that you've entered everything correctly. The set of attributes you can edit depends on the connector. Typically, for JDBC-based connectors you can provide a custom connection string (but do not enter login and password there, they will still be picked up from the corresponding fields).
Once a connection is set up, you are ready to start creating queries. There are multiple ways to do so: manually or programmatically.
Note that the platform supports caching of results.
Credentials
Credentials required to access data sources should be defined in the setup connection. Read more about credentials here: Credentials.
Access control
Connections are first-class entities in the Datagrok platform, and as such are subjects to the standard checks and
routines performed against them whenever they are used in the specific context. Some of the most popular privileges
are: view
, edit
, delete
, and share
. Those privileges can be given to individual users, or
to groups. For more information on the access privilege model, check
out privileges.
Another “out of the box” feature that comes with connections being first-class entity is the audit trail for every action performed against the connection. For details on that, check out Audit page.
Caching
You can force the platform to cache results of executing queries (taking into account parameters as well). This is
useful when a query executes slowly and returns relatively small results. A popular use case is caching the values that
are used for building the UI automatically
(typically this is some form of select distinct <name> from <table>
) is a good idea.
You can turn caching on for either the whole connection, or for a particular query. For a connection, open
its properties, and check the "Cache Results" checkbox. When you check it, the "Invalidate On"
input becomes visible. Enter the
cron expression
there to define cache invalidation timepoints (for instance, if a database refreshes overnight and you want to
invalidate it at 1am each night, enter 0 0 1 * * ?
). Leaving the field blank will make the cache stay forever.
If a connection gets created automatically as part of the package, you can specify the cacheResults
parameter in the connection json definition:
{
"name": "Northwind",
"parameters": {
"server": "dev.datagrok.ai",
"port": 23306,
"db": "Northwind",
"cacheSchema": false,
"cacheResults": true,
"ssl": false,
"connString": ""
}
}
To cache results of individual queries, edit the query (either via Datagrok UI if a query already exists, or by editing
the corresponding .sql file of the package queries)
and specify the meta.cache
and meta.invalidate
fields:
--name: getProductNames
--input: string department
--meta.cache: true
--meta.invalidate: 0 0 1 * * ?
select distinct name from products p
where p.department = @department
Filtering
You can use these fields to filter connections with smart search:
Field | Description |
---|---|
ID | |
name | |
server | |
port | |
db | |
login | |
dataSource | |
description | |
createdOn | |
updatedOn | |
author | User object |
starredBy | User object |
commentedBy | User object |
usedBy | User object |
JDBC connection
For some cases connection may require custom JDBC connection string. For this case, JDBC-based data connection has parameter "Conn. string". If filled, it will be used for connection, and all other parameters will be ignored except for "Login" and "Password".
Videos
See also: