Datagrok: Swiss Army Knife for Data

Ask DeepWiki

Need quick answers about functionality, troubleshooting, or advanced features? Ask DeepWiki.
Try questions like: "How do I add a regression line to a scatterplot?" or "How do I publish a package?"

Why Datagrok?

Datagrok helps you understand data and take action.

It's fast and powerful: you can load the entire ChEMBL database (2.7 million molecules) in your browser, run substructure searches, apply filters, visualize, and interactively explore the chemical space.

Datagrok goes beyond standard data analytics. You can access data from any source, catalog it, analyze and visualize it, run scientific computations, train and apply models, and do more. Need a specific tool or functionality? Easily integrate or add your own code. Datagrok's plugin architecture makes it easy to deliver cohesive, fit-for-purpose solutions.

Access

Get your data from anywhere - databases, web services, file shares, pipelines. If it's machine-readable, we can work with it!

40+ connectors to all major databases and file shares (or create your own)
Support for OpenAPI and access to public datasets
20+ file formats, 30+ molecule structure formats. Drag-and-drop files to open
Browse relational database schemas
Create, edit, and debug queries with visual tools
Annotate queries and save query results as dynamic dashboards.

Learn more about data access.

Govern

Use catalogs, data lineage tools, audit, and usage analysis to take control. Your data is FAIR and secure.

Control who, what, where, and how: roles, groups, and privileges, flexible authentication, secrets managers
Centralized metadata-annotated catalog of entities. Powerful "everything" browser for managing data, connections, users, and more
Built-in data provenance, data lineage, impact analysis, usage analysis, and audit tools
Global search.

Transform

Automatically generate macros from data transformations and use them on new datasets.

Aggregate, join, filter, and edit data, from the UI or programmatically
Use 500+ available functions, or write your own in JavaScript, Python, R (or any other language that compiles to WASM)
Record and apply macros, use in pipelines
Visually edit query transformations.

Learn more about functions.

Explore

Slice, dice, and visualize your data. Render millions of data points interactively and find patterns. Build dynamic dashboards in seconds. Leverage metadata for automated data enrichment and contextual suggestions.

50+ interactive viewers for synchronized, dynamic dashboards
Integration with visualizations in R, Python, or Julia
Built-in regression and formula lines, confidence intervals, correlations, and statistics
Automatic detection of outliers, missing values, or incorrect data types
Adaptive UI and data-specific suggestions.

Compute

Write in any language, annotate, publish, and apply scientific models, methods, and apps. Solve differential equations and run simulations for complex processes.

500+ available functions, or write your own in R, Python, or JavaScript
Metadata-annotated scripts with cross-language support
Scalable and reproducible computations, model lifecycle management
Auto-generated UI.

Learn more about Compute.

Learn

No-code modeling. State-of-the-art cheminformatics engines and ML toolkit included.

Train, assess, apply, and share models (or integrate your own)
Native support for R, Python, Julia, Matlab, and Octave
Open any dataset with a Jupyter notebook
ML toolkit: statistical hypothesis testing, multivariate analysis, dimensionality reduction, data clustering, variance analysis.

Collaborate

Share anything with anyone. Collaborate on decision-making. Use an open source ecosystem to save costs and innovate.

Share within Datagrok, as a URL link, or integrate: REST API, JS API, or embed as an iframe
50+ open source plugins, including specialized ones for cheminformatics, bioinformatics, NLP, and others
Data annotations, team discussions
Community forum for ideas, support, and feedback.

Extend

Customize anything, from context actions to UI elements. Fast development and deployment time with seamless integration.

JavaScript API for extending Datagrok
App marketplace: use or customize ours, build your own, or integrate with third party apps
Developer tools, UI toolkit
Comprehensive help: wiki, exercises, community forum.

Who is it for?

Data: Datagrok is optimized for structured, tabular data. It automatically detects the semantics, like zip codes or molecules, and has built-in support for areas like cheminformatics, bioinformatics, data science, and others. Need more? Create your own plugin.

Skillset: Datagrok is for anyone who works with data:

Chemists analyzing SAR tables? Perfect fit.
Data analysts? Drag and drop your local files to start analyzing.
Data scientists mapping new store locations? Excellent for strategic planning.
Research scientists running complex simulations? Absolutely.
Data engineers? Automatically convert queries to dynamic dashboards, no coding needed.
Developers? Quickly develop and test data-driven applications.

Team size: Datagrok is for individuals and teams of all sizes - from startups to large enterprises. The platform is enterprise-ready, scalable, and ideal for sharing and collaboration.

What makes it so flexible?

Our mission is to help anyone understand their data, even in complex scenarios:

Data that's scattered across various data sources
Data that needs specialized, domain-specific tools
Teams that have different data needs and expertise.

Here's how we do it.

JS API: With JS API, you aren't confined to pre-built features or interfaces. Add new data formats, connectors, transformations, augmentations, dynamic calculations, UI elements, full-scale applications, workflows, and more. The API also provides seamless integration with data sources and other tools, crucial for large enterprises combatting data silos and complex data ecosystems.

Functions: In Datagrok, every task is a function that can be annotated. Annotations make functions versatile, allowing them to work on their own or within larger scripts, no matter the function's language or role. This means you can use functions as blocks to build on your team's collective expertise while fully leveraging Datagrok's capabilities. (See the cheminformatics example below).

Semantic types: Semantic data types provide domain-specific customization:

Automatic detection of domain-specific data types
Domain-specific menus and context actions
Custom data rendering, including spreadsheets and visualizations
Specialized data editing and filtering interfaces
Domain-specific calculation and data processing functions
Fit-for-purpose apps built on top of Datagrok.

See this example for cheminformatics.

What makes it so fast?

Our goal is to let you explore at the speed of thought. To achieve this, we designed Datagrok from scratch:

Data engine: In-memory columnar database that runs on both server and web browser. Fast random access, efficient data storage, aggregation, compression, filtering, transformation, and caching.
Native viewers: Access the data engine directly for maximum performance. They share statistics, cached calculations, and cooperate on tasks like filtering or selection.
App server: Uses the data engine to exchange binary-optimized datasets with the client. Custom ORM to efficiently work with metadata in Postgres.
Compute engine: Supports multiple languages working with binary-optimized datasets. Scales well. GPU acceleration of ML routines. Supports custom Docker containers.

Learn more about Datagrok's architecture and performance optimization.

Why Datagrok?​

Access​

Govern​

Transform​

Explore​

Compute​

Learn​

Collaborate​

Extend​

Who is it for?​

What makes it so flexible?​

What makes it so fast?​

Solutions​