Skip to main content

Scatterplot

A scatterplot displays data points on the X and Y axes to show the relationship between two variables. By using marker color, shape, and size, you can show up to three additional data dimensions. A scatterplot is a chemically-aware viewer and is a great choice to visualize chemical space or activity cliffs.

Controls

Context menuRight-click
ZoomAlt+Mouse Drag
Zoom inMouse Wheel Up or Plus
Zoom outMouse Wheel Down or Minus
Double-clickReset view
SelectShift+Mouse Drag, Ctrl+Click, Shift+Click
Invert selectedCtrl+Mouse Click
ScrollUp, Down, Left, Right
Toggle lasso toolL
Toggle regression lineR
Show in full screenAlt+F

Adding and configuring a scatterplot

To add a scatterplot, click the Scatterplot icon on the Toolbox.

Use the viewer controls to select columns for each axis and marker color and size. You can also drag-and-drop columns from the grid into the scatter plot. For additional configurations, click the Gear icon on top of the viewer and set your preferences in the Context Panel.

You can also access key settings from the context menu by right-clicking. Element-specific context menus appear when you click on legend, axis, or label.

Data source

To specify the rows to show on the scatter plot, use the "Table" and "Row Source" properties in the Data section on the Context Panel.

  • "Table" to visualize a table other than the current one
  • "Row Source" to visualize a subset of data:
    • "Filtered" (the default value) - scatter plot follows view filter
    • "All", "Selected", "SelectedOrCurrent", "FilteredSelected", "MouseOverGroup", "CurrentRow", "MouseOverRow" - other options useful for providing interactivity

You can further filter visible rows by setting the "Filter" property to an expression, such as ${AGE} > 18.

X, Y, Colors, Sizes, Markers

Use the column selectors on top of the scatterplot, and the popup menu to setup basic properties:

Filtering

In addition to visualizing filtered rows, scatter plot can also be used to filter the table, which in turn affects what you see on other viewers on this view. This behavior is controlled by the "Zoom and Filter" property:

  • "filter by zoom" (default): as you zoom in, global view filter changes to show only rows that are visible on the scatterplot. In this mode, "Filter Out Invalid" property defined whether rows that could not be visualized on the scatter plot (such as negative values on log scales) should be filtered out.
  • "no action": zooming in does not affect view filter
  • "zoom by filter": as the view filter changes, scatter plot zooms in to the minimum area containing filtered points. This is useful for analyzing clusters of data.
  • "pack and zoom by filter": mostly same as "zoom by filter", but in case categorical values are shown on an axis and some categories are completely filtered out, these categories get removed (packed) from the axis. Useful when visualizing data that has a large number of categories.

Selection

Scatterplot highlights selected rows in yellow, and lets you select points as well:

  • To select area: Shift + mouse-drag
  • To unselect area: Ctrl + Shift + mouse-drag
  • To toggle point selection: Ctrl + click

To switch between lasso and rectangular selection modes, press L or click Lasso Tool from the context menu.

Regression lines

To show a regression line, press R or check the "Show Regression Line" property on the context panel. To hide the equation, uncheck "Show Regression Line Equation".

Formula lines

A scatterplot can show reference lines that represent formulas or equations. These lines are used to emphasize specific areas on the chart or data. Common examples include a regression line, value bands, and so on.

To show a custom formula line, right-click a scatterplot, then choose Tools > Formula Lines... This action opens a Formula Lines dialog. Here, enter your formula and configure the line settings. Your formula should refer to the columns on the X and Y axes. The syntax for the formula is similar to that used to Add New Column.

Formula lines

Tooltip

By default, a scatterplot inherits the tooltip from the grid. However, you can customize the scatterplot's tooltip to show the data you want using the Tooltip info pane on the Context Panel. To configure a custom tooltip:

  1. Enable custom tooltip: Set Show Tooltip to Show custom tooltip.

  2. Choose which columns to display: In Row Tooltip, select the columns whose values you want to show in the tooltip.

  3. Control axis values in the tooltip: In Data Values, specify how axis values should appear:

    • To exclude axis values from the tooltip, choose Do not add.
    • To show only axis values, choose Data values only.
    • To add axis values, choose Merge.

Scatterplot Custom Tooltip

In addition, a scatterplot itself can be used as a group tooltip, which may be especially useful when dealing with grouped or clustered data or when the screen space is limited.

To make scatterplot appear in a tooltip when you hover over a row group, right-click on the scatterplot and select Tooltip > Use as Group Tooltip.

Group Tooltip

Labels

To show values next to the markers, set the Labels settings either from context menu or from the properties panel:

  • To select columns to show, expand Label Form and check or drag-and-drop columns
  • To select a subset of rows to show, use Show Labels For
    • You can drag-and-drop labels to exact positions in the Selected or Current modes
  • Use Label as Marker renders centered label instead of the marker. This is particularly useful for zooming in on molecular datasets (points become molecules).
  • Check Show Column Names to show column names next to the values

To quickly adjust settings for labels, right-click on the label.

Jitter

Many points in your dataset might fall into the same X and Y coordinates (often happens with integer or categorical columns). To spread them out on the plot, set Jitter Size:

Connecting lines

You can set a column that defines order in which points are connected. Below, we see the (gdp, life expectancy) trajectory of different countries over time.

Cheminformatics

Scatterplot supports custom value renderers, and in particular it's very useful for visualizing high-dimensional chemical data. If you choose to do so, molecules could be rendered on axes, as labels, or on tooltips. To learn more, check out cheminformatics.

WebGPU acceleration

WebGPU acceleration allows you to quickly render massive datasets (10 millions rows and more). To get the maximum performance, set Zoom and Filter property to "no action".

This feature is currently in beta. To enable it, check Settings > Beta > Enable Scatter Plot Web GPU Acceleration

Videos

ScatterPlot

Properties

PropertyTypeDescription
Data
Filter Out InvalidbooleanInvalid are null values and not positive numbers if axis is logarithmic.
Show Filtered Out PointsbooleanWhen true, filtered out points are rendered using Filtered Out Rows Color.
Axes Follow FilterbooleanWhen true, scatter plot will zoom to an area defined by the range filters for X and Y columns, even if Zoom And Filter property is not set to Zoom by Filter.
Zoom And FilterstringDetermines the relationship between table filter and scatter plot area: * No action: they are disconnected * Filter by zoom: scatter plot acts as a filter; as you zoom in, points get filtered out * Zoom by filter: scatter plot focuses on the filtered points as the filter changes * Pack and zoom by filter: removes filtered out categories and focuses on the filtered points as the filter changes.
FilterstringFormula that filters out rows to show. Examples: ${AGE} > 20 or ${WEIGHT / 2)} > 100, ${SEVERITY} == ''Medium'', ${RACE}.endsWith(''sian'')
Tablestring
X
X Column NamestringA column to use on the X axis. Could be numerical or categorical.
X MapstringTime unit map function for x column (applicable to dates only).
X Axis Typestring
Invert X Axisboolean
X Minnumber
X Maxnumber
Show Vertical Grid Linesboolean
Show X Axisboolean
Show X Selectorboolean
X Whisker Min Column NamestringPoint lower bound for x axis whiskers. Selecting it disables X Whisker Range.
X Whisker Max Column NamestringPoint upper bound for x axis whiskers. Selecting it disables X Whisker Range.
X Whisker Range Column NamestringPoint range for x axis whiskers. Applied only if X Whisker Min and X Whisker Max are not set.
X Axis Label Orientationstring
Y
Y Column NamestringA column to use on the Y axis. Could be numerical or categorical.
Y MapstringTime unit map function for y column (applicable to dates only).
Y Axis Typestring
Invert Y Axisboolean
Y Minnumber
Y Maxnumber
Show Horizontal Grid Linesboolean
Show Y Axisboolean
Show Y Selectorboolean
Y Whisker Min Column NamestringPoint lower bound for y axis whiskers. Selecting it disables Y Whisker Range.
Y Whisker Max Column NamestringPoint upper bound for y axis whiskers. Selecting it disables Y Whisker Range.
Y Whisker Range Column NamestringPoint range for y axis whiskers. Applied only if Y Whisker Min and Y Whisker Max are not set.
Axes
Show X HistogrambooleanShows a distribution histogram along the X axis (at the top)
Show Y HistogrambooleanShows a distribution histogram along the Y axis (on the right)
Histogram BinsnumberNumber of bins for axis histograms
Color
Color Column NamestringA column to be used for color-coding. Could be numerical or categorical. If not set, Filtered Rows Color is used for markers that pass the filter. Color palettes could defined either for columns in the column context panel, or via Linear Color Scheme and Categorical Color Scheme properties.
Color MapstringCategorical coloring time unit map function (applicable to dates only).
Show Color Selectorboolean
Color Axis Typestring
Invert Color Schemeboolean
Color Minnumber
Color Maxnumber
Size
Size Column NamestringA numerical column to use for size-coding markers. See also Marker Min Size and Marker Max Size.
Show Size Selectorboolean
Marker
Markers Column NamestringA categorical column that determines the shape of the markers.
Markers MapstringMarker category time unit map function (applicable to dates only).
Marker Typestring
Marker Default SizenumberBy default - automatic sizing based on current dataframe
Marker Opacitynumber
Jitter SizenumberRandomly shift (x, y) marker position up to the Jitter Size pixels. Useful when multiple points fall on the same exact position. If Jitter Size Y is defined, then Jitter Size shifts x only.
Jitter Size YnumberRandomly shift y marker position up to the Jitter Size Y pixels.
Marker Draw Borderboolean
Marker Border Widthnumber
Marker Min Sizenumber
Marker Max Sizenumber
General
Lines Order Column NamestringWhen defined, a line would be drawn for each series (defined by the categorical color column) using the order specified by Lines Order
Lines WidthnumberDefines the width of the lines connecting the markers. See Lines Width.
Show Min Max TickmarksbooleanShows tickmarks and labels for minimum and maximum value on each axis.
Show Drop LinesbooleanShows exact X and Y coordinates for the mouse cursor.
Mouse Dragstring
Lasso ToolbooleanWhen true, lasso area selector is used instead of the rectangular one. Toggle this option by pressing L.
Allow Zoomboolean
Legend Visibilityvisibilitymode
Legend Positionflexautoposition
Row SourcestringDetermines the rows shown on the plot.
Allow Dynamic Menusboolean
Show Context MenubooleanProperties common for all viewers todo: use code generation
Titlestring
DescriptionstringViewer description that gets shown at the Descriptor Position. Markup is supported.
HelpstringHelp to be shown when user clicks on the ''?'' icon on top. Could either be in markdown, or a URL (starting with ''/'' or ''http'').
Description Positionflexposition
Description Visibility Modevisibilitymode
Labels
Label Column NameslistLabel columns to show next to the markers.
Show Labels ForstringDetermines the rows shown on the scatter plot.
Display LabelsvisibilitymodeDetermines how to show marker label: * Always - show labels for all visible markers * Auto - show labels only for markers where enough space is available * Never - show no labels.
Show Label Named ColumnsvisibilitymodeDetermines whether to show column names next to label values.
Use Label As MarkerbooleanIf checked, display a label content as marker.
Label Color As MarkerbooleanTo display labels separately or as markers (works for non-text labels).
Label As Marker SizenumberMarker size in which label is inscribed.
Label Content SizenumberLabel inner content size.
Lines
Show Regression LinebooleanRegression line visibility (toggle by pressing R)
Show Regression Line Equationboolean
Show Spearman Correlationboolean
Show Pearson Correlationboolean
Show Mean Absolute Errorboolean
Show Root Mean Square Errorboolean
Regression Per Categoryboolean
Show Dataframe Formula LinesbooleanControl the visibility of dataframe-originated formula lines. Edit formula lines by right-clicking and selecting Tools | Formula Lines from the popup menu. Requires the PowerPack plugin.
Show Viewer Formula LinesbooleanControl the visibility of dataframe-originated formula lines. Edit formula lines by right-clicking and selecting Tools | Formula Lines from the popup menu. Requires the PowerPack plugin.
Selection
Show Current PointbooleanControls the indication of the current row
Show Mouse Over PointbooleanControls the indication of the mouse-over row
Show Mouse Over Row GroupbooleanHighlight ''mouse-over'' rows (such as the ones that fall into a histogram bin that the mouse is currently hovering over).
Show Selected RowsbooleanWhen true, selected markers are highlighted using the selected rows color. When false, selected markers use their regular color coding.
Style
Auto Layoutboolean
Back Colornumber
Filtered Rows Colornumber
Filtered Out Rows Colornumber
Selected Rows Colornumber
Missing Value Colornumber
Label Colornumber
Axis Line Colornumber
Axis Text Colornumber
Grid Line Colornumber
Regression Line Colornumber
Whisker Colornumber
Regression Line Transparencynumber
Linear Color Schemelist
Categorical Color Schemelist
Axes Use Column FormatbooleanDetermines whether the axes should follow the non-precision-related format (such as money) set for the corresponding column.
Auto Axis SizebooleanIf true, X Axis Height and Y Axis Width are calculated automatically to fit the required precision. If false, the specified X Axis Height and Y Axis Width properties are used.
X Axis HeightnumberRequires Auto Axis Size to be turned off.
Y Axis WidthnumberRequires Auto Axis Size to be turned off.
Axis Fontstring
Label Fontstring
Formula Fontstring
Annotation Fontstring
Controls FontstringViewer controls elements font.
Annotation regions
Show Viewer Annotation Regionsboolean
Show Dataframe Annotation Regionsboolean
Tooltip
Show TooltipstringControls scatter plot tooltip visibility
Show Labelsvisibilitymode
Data ValuesstringControls whether columns on X and Y axes are displayed in tooltip * Do not add: they are not shown * Data values only: only they are shown * Merge: standard behavior
Row TooltipstringNewline-separated list of column names to be used in a tooltip. Requires showTooltip to be enabled.
Row Group Tooltipstring
Description
Show Titleboolean

See also: