Datagrok provides a single, unified access point for data accross an organization, simplifying the process of centralizing data collected from multiple sources. You can easily visualize, explore, and learn from your data, and use the insights gained to take action. Additionally, the platform provides access controls, security features, caching, and automatic monitoring of connection health.
Besides local files that you can drag and drop from your computer, Datagrok integrates with various data providers. You can connect to any machine readable source: a file storage (like third-party cloud services or an organization-hosted Datagrok server), databases, or webservices.
Datagrok also supports scripting in various languages, such as R, Julia, and Python, which means you can create custom data sources. For example, you can load a dataframe from an external website or package, open a specific table using its ID, or write a package to extract data from multiple sources and combine them into one. For more information on getting data using functions and scripts, see Access data section in the developers' documentation.
Datagrok also hosts public datasets that can be used for analysis, testing, and prototyping. These datasets cover various domains, including cheminformatics, clinical trials, and more.
A data connection is an entity representing information required to connect to a specific data source, such as its address and credentials. A data connection allows you to work with files and database tables directly in Datagrok. When connecting to a data source, you can access data manually from the UI, or programmatically, through an application. For manual data access, Datagrok provides a convenient UI that lets you connect directly to any of the 30+ supported connectors, retrieve data using queries, and securely share data with others.
A connector is a plugin that enables the integration of external data providers into the platform. It can work with a database, an Excel file, a CSV file, a web service, or any other source capable of providing the data. Most of our data connectors are open-sourced and extendable (under MIT license).
To see all available data source connections, on the Sidebar, select Manage > Connections. From there, you can search connection by name or by tag.
A data connection is an entity, which means it can be shared, assigned permissions, annotated, and more.
For instructions on how to add a supported data source, set credentials, share, and manage it from the UI, see documentation for each data source type.
For instructions on how to add a supported data source, set credentials, share, and manage it programmatically, see developer's documentation.
For specific details on the configuration required, see each individual connector's documentation page in the Connectors directory.
A data query is a function associated with a data connection that typically returns a dataframe. Queries can be executed either manually, or as part of data jobs. Datagrok has a convenient interface for creating, running, and sharing query results, including aggregation editor, auto-generated parameter dialogs, and an ability to create dynamic dashboards to visualize query results. All data governance features, such as data lineage, history, and security, are applicable to queries. For more information about queries, see documentation for the respective data source type.
Typically, a query is run against a database, however the same concepts apply for other data sources that are listed below:
A data query is an entity, which means it can be shared, assigned permissions, annotated, and more.
With Datagrok, you can retrieve both structured and unstructured data. Datagrok supports multiple data formats, including popular formats like CSV, TXT, JSON, and scientific formats like MAT, molecular structure formats (like PDB, MOL, or SDF), geographic annotation, and others. Datagrok also offers a flexible system for extending the platform with organization-specific data formats (see Extensible framework).
Browsing and preview
Datagrok offers an array of capabilities and features designed to help users efficiently browse, manage, and preview the content of their data. For more information, see:
Sharing and access control
Datagrok treats data connections, file shares, database tables and columns, and queries as entities, which means there is a common set of operations that can be applied to them. These entities can be shared with others, assigned access privileges, commented on, versioned, audited, and so on. Some of the most popular privileges are:
share. These privileges can be given to individual users, or
to groups. For more information on the access privilege model, see Privileges.
Data connections can be shared as part of a project, package (and repository containing this package), or as a standalone entity. When you share a query with someone, the database connection associated with it is automatically shared as well. This is because the query's access rights depend on the access rights of the connection. However, if you share a database connection with someone, your queries won't be shared automatically. You need to share them separately. For web queries, they are shared automatically when the corresponding connection is shared.
To learn how to control access for each data source, see the documentation for the corresponding data source.
We designed Datagrok as an extensible environment, where extensions can customize or enhance any part of the platform. For example, you can create custom connectors, add organization-specific data formats, customize menus, add context actions, customize data preview, and more.