Skip to main content

Exploratory data analysis

Before we can learn from data, we need to understand it. Exploratory data analysis (EDA) is a process of performing initial investigation on data to discover patterns, spot anomalies, test hypothesis, and check assumptions.

By its nature, EDA is visually-driven. Most of today's datasets are too big, too complex, and diverse to be explored in a tabular format or by statistical means alone. On the other hand, humans evolved to understand complex information visually and are better than computers at detecting patterns and anomalies.

Interactivity is key. We may not know what we are looking for until we extract knowledge from data and update our understanding as we go. To uncover insights that otherwise may go unnoticed, we need to be able to quickly change both what we are viewing and how we are viewing it:

  • Look at data from multiple perspectives at once
  • Zoom in and filter
  • Manipulate, edit, and add data
  • Get details on demand
  • Select rows of interest, and see how they compare to other row sets.

From the ground up, we designed Datagrok for visually-driven EDA of big, complex datasets. Unlike other tools that use conventional client-server architecture, Datagrok's proprietary in-memory database makes it possible to analyze millions of columns and billions of rows at the speed of thought right in your browser.

With Datagrok, you can:

You can also leverage Datagrok's component-based architecture to extend or create any element you like. For example, you can add custom viewers or develop new functions in R, Python, or Julia.

Each of these actions can be automated and used in pipelines. Sharing the results of your analysis is easy and secure.

With Datagrok, anyone can use their domain knowledge and perceptive abilities to explore data and uncover its meaning.

Examples

Interactive Data Visualization

An overview of some of the visualization capabilities of the Datagrok platform, including the concepts of views, viewers, selection, filter, and layouts.

Coffee Company

How do we choose the best location for a new coffee place, given the historical sales data? Datagrok to the rescue! In less than 20 minutes, we achieve the following:
• Retrieve historical data from the Postgres database
• Explore, visualize, and clean the dataset
• Impute missing values
• Extract census data from the long/lat coordinates
• Perform multivariate analysis
• Build multiple predictive models, and assess their performance
• Build an interactive map for predicting sales
• Deploy the results as an app to all users in our company

See also