scatter
Scatter plots show the relationship between two columns as a scatter of data points.
Usage
gurita scatter [-h] [-x COLUMN] [-y COLUMN] ... other arguments ...
Arguments
Argument |
Description |
Reference |
---|---|---|
|
display help |
|
|
select column for the X axis |
|
|
select column for the Y axis |
|
|
group columns by hue |
|
|
order of hue columns |
|
|
name of categorical column to use for plotted dot marker style |
|
|
scale the size of plotted dots based on a column |
|
|
size range for plotted point size |
|
|
alpha value for plotted points, default: 0.8 |
|
|
border line width value for plotted points |
|
|
border line colour plotted point |
|
|
log scale X axis |
|
|
log scale Y axis |
|
|
range limit X axis |
|
|
range limit Y axis |
|
|
column to use for facet rows |
|
|
column to use for facet columns |
|
|
wrap the facet column at this width, to span multiple rows |
See also
When one of the two columns being compared is a categorical value the scatter plot is similar to strip plot.
Scatter plots are based on Seaborn’s relplot library function, using the kind="scatter"
option.
Simple example
Scatter plot of the tip
numerical column compared to the total_bill
numerical column from the tips.csv
input file:
gurita scatter -x total_bill -y tip < tips.csv
The output of the above command is written to scatter.total_bill.tip.png
:
Getting help
The full set of command line arguments for scatter plots can be obtained with the -h
or --help
arguments:
gurita scatter -h
Selecting columns to plot
-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN
Scatter plots can be plotted for two numerical columns as illustrated in the example above, one on each of the axes.
Scatter plots can also be used to compare a numerical column against a categorical column. In the example below, the numerical tip
column is compared with the categorical day
column in the tips.csv
dataset:
gurita scatter -x day -y tip < tips.csv
It should be noted that strip plots achieve a similar result as above, and may be preferable over scatter plots when comparing numerical and categorical data.
Swapping -x
and -y
in the above command would result in a horizontal plot instead of a vertical plot.
Colouring data points with hue
--hue COLUMN
The data points can be coloured by an additional numerical or categorical column with the --hue
argument.
In the following example the data points in a scatter plot comparing tip
and total_bill
are
coloured by their corresponding categorical day
value:
gurita scatter -x total_bill -y tip --hue day < tips.csv
When the --hue
paramter specifies a numerical column the colour scale is graduated.
For example, in the following scatter plot the numerical size
column is used for the --hue
argument:
gurita scatter -x total_bill -y tip --hue size < tips.csv
For categorical hue groups, the order displayed in the legend is determined from their occurrence in the input data. This can be overridden with the --hueorder
argument, which allows you to specify the exact ordering of
the hue groups in the legend.
Dot style
--dotstyle COLUMN
By default dots in scatter plots are drawn as circles.
The --dotstyle
argument lets you change the shape of dots based on a categorical column.
gurita scatter -x total_bill -y tip --hue day --dotstyle sex < tips.csv
In the above example the hue of dots is determined by the day
column and the dot marker style is determined by the sex
column. In this case male
dots use a cross marker and female
dots use a circle marker.
It is acceptable for both the --hue
and --dotstyle
arguments to be based on the same (categorical) column in the data set. In such cases both the colour and marker shape will vary with
the underlying column.
Dot size
--dotsize COLUMN
--dotsizerange LOW HIGH
The size of plotted dots in the scatter plot can be scaled according the a numerical column with the --dotsize
argument.
The following example generates a scatter plot comparing sepal_length
to sepal_width
using the iris.csv
dataset. The size of dots in the
plot is scaled according to the petal_length
column.
gurita scatter -x sepal_length -y sepal_width --dotsize petal_length < iris.csv
The range of dot sizes can be adjusted with --dotsizerange LOW HIGH
.
gurita scatter -x sepal_length -y sepal_width --dotsize petal_length --dotsizerange 10 200 < iris.csv
Dot transparency, border line width, border line colour
--dotalpha ALPHA
--dotlinewidth WIDTH
--dotlinecolour COLOUR
The transparency of dots is defined by the dot alpha value, which is a number ranging from 0 to 1, where 0 is fully transparent and 1 is fully opaque.
By default the alpha transparency value of scatter plot dots is 0.8. This can be
overridden with --dotalpha
.
Dots are plotted with a thin white border by default. The border line width can be changed with --dotlinewidth
and the border line colour can
be changed with --dotlinecolour
.
In the following example, the dot alpha is set to 1 (fully opaque), the border line width is set to 0.5, and the border line colour is set to black.
gurita scatter -x total_bill -y tip --dotalpha 1 --dotlinewidth 0.5 --dotlinecolour black < tips.csv
Log scale
--logx
--logy
The distribution of numerical values can be displayed in log (base 10) scale with --logx
and --logy
.
For example the following command produces a scatter plot comparing total_bill
with tip
, such that total_bill
on the X axis is plotted in log scale:
gurita scatter -x total_bill -y tip --logx < tips.csv
Axis range limits
--xlim LOW HIGH
--ylim LOW HIGH
The range of displayed numerical distributions can be restricted with --xlim
and --ylim
. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of the (inclusive) range to be displayed.
For example the following command produces a scatter plot comparing total_bill
with tip
, such that the range of total_bill
on the X axis is limited to values between 20 and 40 inclusive:
gurita scatter -x total_bill -y tip --xlim 20 40 < tips.csv
Facets
--frow COLUMN
--fcol COLUMN
--fcolwrap INT
Scatter plots can be further divided into facets, generating a matrix of scatter plots, where a numerical value is further categorised by up to 2 more categorical columns.
See the facet documentation for more information on this feature.
For example the following command produces a scatter plot comparing total_bill
with tip
, such that facet column is determined by the value of the smoker
column.
gurita scatter -x total_bill -y tip --fcol smoker < tips.csv