Datagrok: Swiss Army Knife for Data
Why Datagrok?
Datagrok helps you understand data and take action.
It's fast and powerful: you can load the entire ChEMBL database (2.7 million molecules) in your browser, run substructure searches, apply filters, visualize, and interactively explore the chemical space.
Datagrok goes beyond standard data analytics. You can access data from any source, catalog it, analyze and visualize it, run scientific computations, train and apply models, and do more. Need a specific tool or functionality? Easily integrate or add your own code. Datagrok's plugin architecture makes it easy to deliver cohesive, fit-for-purpose solutions.
Access
Get your data from anywhere - databases, web services, file shares, pipelines. If it's machine-readable, we can work with it!
- 40+ connectors to all major databases and file shares (or create your own)
- Support for OpenAPI and access to public datasets
- 20+ file formats, 30+ molecule structure formats. Drag-and-drop files to open
- Browse relational database schemas
- Create, edit, and debug queries with visual tools
- Annotate queries and save query results as dynamic dashboards.
Govern
Use catalogs, data lineage tools, audit, and usage analysis to take control. Your data is FAIR and secure.
- Control who, what, where, and how: roles, groups, and privileges, flexible authentication, secrets managers
- Centralized metadata-annotated catalog of entities. Powerful "everything" browser for managing data, connections, users, and more
- Built-in data provenance, data lineage, impact analysis, usage analysis, and audit tools
- Global search.
Transform
Automatically generate macros from data transformations and use them on new datasets.
- Aggregate, join, filter, and edit data, from the UI or programmatically
- Use 500+ available functions, or write your own in JavaScript, Python, R (or any other language that compiles to WASM)
- Record and apply macros, use in pipelines
- Visually edit query transformations.
Explore
Slice, dice, and visualize your data. Render millions of data points interactively and find patterns. Build dynamic dashboards in seconds. Leverage metadata for automated data enrichment and contextual suggestions.
- 50+ interactive viewers for synchronized, dynamic dashboards
- Integration with visualizations in R, Python, or Julia
- Built-in regression and formula lines, confidence intervals, correlations, and statistics
- Automatic detection of outliers, missing values, or incorrect data types
- Adaptive UI and data-specific suggestions.
Compute
Write in any language, annotate, publish, and apply scientific models, methods, and apps. Solve differential equations and run simulations for complex processes.
- 500+ available functions, or write your own in R, Python, or JavaScript
- Metadata-annotated scripts with cross-language support
- Scalable and reproducible computations, model lifecycle management
- Auto-generated UI.
Learn more about Compute.
Learn
No-code modeling. State-of-the-art cheminformatics engines and ML toolkit included.
- Train, assess, apply, and share models (or integrate your own)
- Native support for R, Python, Julia, Matlab, and Octave
- Open any dataset with a Jupyter notebook
- ML toolkit: statistical hypothesis testing, multivariate analysis, dimensionality reduction, data clustering, variance analysis.
Collaborate
Share anything with anyone. Collaborate on decision-making. Use an open source ecosystem to save costs and innovate.
- Share within Datagrok, as a URL link, or integrate: REST API, JS API, or embed as an iframe
- 50+ open source plugins, including specialized ones for cheminformatics, bioinformatics, NLP, and others
- Data annotations, team discussions
- Community forum for ideas, support, and feedback.
Extend
Customize anything, from context actions to UI elements. Fast development and deployment time with seamless integration.
- JavaScript API for extending Datagrok
- App marketplace: use or customize ours, build your own, or integrate with third party apps
- Developer tools, UI toolkit
- Comprehensive help: wiki, exercises, community forum.
Who is it for?
Data: Datagrok is optimized for structured, tabular data. It automatically detects the semantics, like zip codes or molecules, and has built-in support for areas like cheminformatics, bioinformatics, data science, and others. Need more? Create your own plugin.
Skillset: Datagrok is for anyone who works with data:
- Chemists analyzing SAR tables? Perfect fit.
- Data analysts? Drag and drop your local files to start analyzing.
- Data scientists mapping new store locations? Excellent for strategic planning.
- Research scientists running complex simulations? Absolutely.
- Data engineers? Automatically convert queries to dynamic dashboards, no coding needed.
- Developers? Quickly develop and test data-driven applications.
Team size: Datagrok is for individuals and teams of all sizes - from startups to large enterprises. The platform is enterprise-ready, scalable, and ideal for sharing and collaboration.
What makes it so flexible?
Our mission is to help anyone understand their data, even in complex scenarios:
- Data that's scattered across various data sources
- Data that needs specialized, domain-specific tools
- Teams that have different data needs and expertise.
Here's how we do it.
JS API: With JS API, you aren't confined to pre-built features or interfaces. Add new data formats, connectors, transformations, augmentations, dynamic calculations, UI elements, full-scale applications, workflows, and more. The API also provides seamless integration with data sources and other tools, crucial for large enterprises combatting data silos and complex data ecosystems.
Functions: In Datagrok, every task is a function that can be annotated. Annotations make functions versatile, allowing them to work on their own or within larger scripts, no matter the function's language or role. This means you can use functions as blocks to build on your team's collective expertise while fully leveraging Datagrok's capabilities. (See the cheminformatics example below).
Semantic types: Semantic data types provide domain-specific customization:
- Automatic detection of domain-specific data types
- Domain-specific menus and context actions
- Custom data rendering, including spreadsheets and visualizations
- Specialized data editing and filtering interfaces
- Domain-specific calculation and data processing functions
- Fit-for-purpose apps built on top of Datagrok.
See this example for cheminformatics.
What makes it so fast?
Our goal is to let you explore at the speed of thought. To achieve this, we designed Datagrok from scratch:
-
Data engine: In-memory columnar database that runs on both server and web browser. Fast random access, efficient data storage, aggregation, compression, filtering, transformation, and caching.
-
Native viewers: Access the data engine directly for maximum performance. They share statistics, cached calculations, and cooperate on tasks like filtering or selection.
-
App server: Uses the data engine to exchange binary-optimized datasets with the client. Custom ORM to efficiently work with metadata in Postgres.
-
Compute engine: Supports multiple languages working with binary-optimized datasets. Scales well. GPU acceleration of ML routines. Supports custom Docker containers.
Learn more about Datagrok's architecture and performance optimization.
Solutions
- Self-service analytics
- Data science
- Life sciences
- NLP
- Enterprise IT
- Plugins