scatter

Scatter plots show the relationship between two columns as a scatter of data points.

Usage

gurita scatter [-h] [-x COLUMN] [-y COLUMN] ... other arguments ...

Arguments

Argument

Description

Reference

-h

display help

help

  • -x COLUMN

  • --xaxis COLUMN

select column for the X axis

X axis

  • -y COLUMN

  • --yaxis COLUMN

select column for the Y axis

Y axis

--hue COLUMN

group columns by hue

hue

--hueorder VALUE [VALUE ...]

order of hue columns

hue order

--dotstyle COLUMN

name of categorical column to use for plotted dot marker style

dot style

--dotsize COLUMN

scale the size of plotted dots based on a column

dot size

--dotsizerange LOW HIGH

size range for plotted point size

dot size range

--dotalpha ALPHA

alpha value for plotted points, default: 0.8

dot alpha

--dotlinewidth WIDTH

border line width value for plotted points

dot line width

--dotlinecolour COLOUR

border line colour plotted point

dot border colour

--logx

log scale X axis

log X axis

--logy

log scale Y axis

log Y axis

--xlim BOUND BOUND

range limit X axis

limit X axis

--ylim BOUND BOUND

range limit Y axis

limit Y axis

--frow COLUMN

column to use for facet rows

facet rows

--fcol COLUMN

column to use for facet columns

facet columns

--fcolwrap INT

wrap the facet column at this width, to span multiple rows

facet wrap

See also

When one of the two columns being compared is a categorical value the scatter plot is similar to strip plot.

Scatter plots are based on Seaborn’s relplot library function, using the kind="scatter" option.

Simple example

Scatter plot of the tip numerical column compared to the total_bill numerical column from the tips.csv input file:

gurita scatter -x total_bill -y tip < tips.csv

The output of the above command is written to scatter.total_bill.tip.png:

Scatter plot comparing tip to total_bill in the tips.csv file

Getting help

The full set of command line arguments for scatter plots can be obtained with the -h or --help arguments:

gurita scatter -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

Scatter plots can be plotted for two numerical columns as illustrated in the example above, one on each of the axes.

Scatter plots can also be used to compare a numerical column against a categorical column. In the example below, the numerical tip column is compared with the categorical day column in the tips.csv dataset:

gurita scatter -x day -y tip < tips.csv
Scatter plot comparing tip to day in the tips.csv file

It should be noted that strip plots achieve a similar result as above, and may be preferable over scatter plots when comparing numerical and categorical data.

Swapping -x and -y in the above command would result in a horizontal plot instead of a vertical plot.

Colouring data points with hue

--hue COLUMN

The data points can be coloured by an additional numerical or categorical column with the --hue argument.

In the following example the data points in a scatter plot comparing tip and total_bill are coloured by their corresponding categorical day value:

gurita scatter -x total_bill -y tip --hue day < tips.csv
Scatter plot comparing tip and total_bill coloured by day

When the --hue paramter specifies a numerical column the colour scale is graduated. For example, in the following scatter plot the numerical size column is used for the --hue argument:

gurita scatter -x total_bill -y tip --hue size < tips.csv
Scatter plot comparing tip and total_bill coloured by size

For categorical hue groups, the order displayed in the legend is determined from their occurrence in the input data. This can be overridden with the --hueorder argument, which allows you to specify the exact ordering of the hue groups in the legend.

Dot style

--dotstyle COLUMN

By default dots in scatter plots are drawn as circles.

The --dotstyle argument lets you change the shape of dots based on a categorical column.

gurita scatter -x total_bill -y tip --hue day --dotstyle sex < tips.csv
Scatter plot comparing tip and total_bill with dot size where the dot style is based on the sex categorical column

In the above example the hue of dots is determined by the day column and the dot marker style is determined by the sex column. In this case male dots use a cross marker and female dots use a circle marker.

It is acceptable for both the --hue and --dotstyle arguments to be based on the same (categorical) column in the data set. In such cases both the colour and marker shape will vary with the underlying column.

Dot size

--dotsize COLUMN
--dotsizerange LOW HIGH

The size of plotted dots in the scatter plot can be scaled according the a numerical column with the --dotsize argument.

The following example generates a scatter plot comparing sepal_length to sepal_width using the iris.csv dataset. The size of dots in the plot is scaled according to the petal_length column.

gurita scatter -x sepal_length -y sepal_width --dotsize petal_length < iris.csv
Scatter plot comparing sepal_length and sepal_width with dot size scaled by petal_length using the iris.csv dataset

The range of dot sizes can be adjusted with --dotsizerange LOW HIGH.

gurita scatter -x sepal_length -y sepal_width --dotsize petal_length --dotsizerange 10 200 < iris.csv
Scatter plot comparing sepal_length and sepal_width with dot size scaled by petal_length using the iris.csv dataset, where the size range of dots is set between 10 and 200

Dot transparency, border line width, border line colour

--dotalpha ALPHA
--dotlinewidth WIDTH
--dotlinecolour COLOUR

The transparency of dots is defined by the dot alpha value, which is a number ranging from 0 to 1, where 0 is fully transparent and 1 is fully opaque.

By default the alpha transparency value of scatter plot dots is 0.8. This can be overridden with --dotalpha.

Dots are plotted with a thin white border by default. The border line width can be changed with --dotlinewidth and the border line colour can be changed with --dotlinecolour.

In the following example, the dot alpha is set to 1 (fully opaque), the border line width is set to 0.5, and the border line colour is set to black.

gurita scatter -x total_bill -y tip --dotalpha 1 --dotlinewidth 0.5 --dotlinecolour black < tips.csv
Scatter plot comparing tip and total_bill with dot alpha set to 1, dot line width set to 1, and dot line colour set to black

Log scale

--logx
--logy

The distribution of numerical values can be displayed in log (base 10) scale with --logx and --logy.

For example the following command produces a scatter plot comparing total_bill with tip, such that total_bill on the X axis is plotted in log scale:

gurita scatter -x total_bill -y tip --logx < tips.csv
Scatter plot comparing tip and total_bill with the X axis in log scale.

Axis range limits

--xlim LOW HIGH
--ylim LOW HIGH

The range of displayed numerical distributions can be restricted with --xlim and --ylim. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of the (inclusive) range to be displayed.

For example the following command produces a scatter plot comparing total_bill with tip, such that the range of total_bill on the X axis is limited to values between 20 and 40 inclusive:

gurita scatter -x total_bill -y tip --xlim 20 40 < tips.csv
Scatter plot comparing tip and total_bill with the X axis range limited to values between 20 and 40 inclusively.

Facets

--frow COLUMN
--fcol COLUMN
--fcolwrap INT

Scatter plots can be further divided into facets, generating a matrix of scatter plots, where a numerical value is further categorised by up to 2 more categorical columns.

See the facet documentation for more information on this feature.

For example the following command produces a scatter plot comparing total_bill with tip, such that facet column is determined by the value of the smoker column.

gurita scatter -x total_bill -y tip --fcol smoker < tips.csv
Scatter plot comparing tip and total_bill with facet columns determined by the value of smoker