swarm

Swarm plots show the distribution of values in a numerical column optionally grouped by categorical columns.

Usage

gurita swarm [-h] [-x COLUMN] [-y COLUMN] ... other arguments ...

Arguments

Argument	Description	Reference
`-h`	display help	help
`-x COLUMN` `--xaxis COLUMN`	select column for the X axis	X axis
`-y COLUMN` `--yaxis COLUMN`	select column for the Y axis	Y axis
`--orient {v,h}`	Orientation of plot. Allowed values: v = vertical, h = horizontal. Default: v.	orient
`--order VALUE [VALUE ...]`	controlling the order of the plotted swarms	order
`--hue COLUMN`	colour and/or group columns by hue	hue
`--dodge`	separate hue levels along the categorical axis	dodge
`--hueorder VALUE [VALUE ...]`	order of hue columns	hue order
`--logx`	log scale X axis	log X
`--logy`	log scale Y axis	log Y
`--xlim BOUND BOUND`	range limit X axis	limit X axis
`--ylim BOUND BOUND`	range limit Y axis	limit Y axis
`--frow COLUMN`	column to use for facet rows	facet rows
`--fcol COLUMN`	column to use for facet columns	facet columns
`--fcolwrap INT`	wrap the facet column at this width, to span multiple rows	facet wrap

Simple example

Swarm plot of the age numerical column from the titanic.csv input file:

gurita swarm -y age < titanic.csv

The output of the above command is written to swarm.age.png:

Swarm plot showing the distribution of age for the titanic data set

The plotted numerical column can be divided into groups based on a categorical column. In the following example the distribution of age is shown for each value in the class column:

gurita swarm -y age -x class < titanic.csv

The output of the above command is written to swarm.class.age.png:

Swarm plot showing the distribution of age for each class in the titanic data set

Getting help

The full set of command line arguments for swarm plots can be obtained with the -h or --help arguments:

gurita swarm -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

Swarm plots can be plotted for numerical columns and optionally grouped by categorical columns.

If no categorical column is specified, a single column swarm plot will be generated showing the distribution of the numerical column.

Note

By default the orientation of the swarm plot is vertical. In this scenario the numerical column is specified by -y, and the (optional) categorical column is specified by -x.

However, the orientation of the swarm plot can be made horizontal using the --orient h argument. In this case the sense of the X and Y axes are swapped from the default, and thus the numerical column is specified by -x, and the (optional) categorical column is specified by -y.

In the following example the distribution of age is shown for each value in the class column, where the boxes are plotted horizontally:

gurita swarm -x age -y class --orient h < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, shown horizontally

Controlling the order of the swarms

--order VALUE [VALUE ...]

By default the order of the categorical columns displayed in the swarm plot is determined from their occurrence in the input data. This can be overridden with the --order argument, which allows you to specify the exact ordering of columns based on their values.

In the following example the swarm columns of the class column are displayed in the order of First, Second, Third:

gurita swarm -y age -x class --order First Second Third < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, shown in a specified order

Colour and/or group columns with hue

--hue COLUMN

Each swrm can be coloured and optionally subdivided into additional categories with the --hue argument.

The following example generates a swarm plot showing the distribution of the age of titanic passengers across the three different ticket classes, where each class is coloured differently:

gurita swarm -y age -x class --hue class < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, grouped by class and coloured by class

In the following example the distribution of age is shown for each value in the class column, and further sub-divided by the sex column:

gurita swarm -y age -x class --hue sex < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, grouped by class and sex

As the previous example demonstrates, when --hue is used, by default all hue levels are shown mixed together in the same swarm. However, you might want to show each hue level in its own swarm. This can be achieved with the --dodge command.

The --dodge argument will separate hue levels along the categorical axis, rather than mix them together:

gurita swarm -y age -x class --hue sex --dodge < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, grouped by class and sex, with the sex data separated into swarms

By default the order of the columns within each hue group is determined from their occurrence in the input data. This can be overridden with the --hueorder argument, which allows you to specify the exact ordering of columns within each hue group, based on their values.

In the following example the sex values are displayed in the order of female, male:

gurita swarm -y age -x class --hue sex --hueorder female male < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, grouped by class and sex, and the order of sex values specified

It is also possible to use both --order and --hueorder in the same command. For example, the following command controls the order of both the class and sex categorical columns:

gurita swarm -y age -x class --order First Second Third --hue sex --hueorder female male < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, grouped by class and sex, and the order of class and sex values specified

Log scale

--logx
--logy

The distribution of numerical values can be displayed in log (base 10) scale with --logx and --logy.

It only makes sense to log-scale the numerical axis (and not the categorical axis). Therefore, --logx should be used when numerical columns are selected with -x, and conversely, --logy should be used when numerical columns are selected with -y.

For example, you can display a log scale swarm plot for the age column grouped by class (when the distribution of age is displayed on the Y axis) like so. Note carefully that the numerical data is displayed on the Y-axis (-y), therefore the --logy argument should be used to log-scale the numerical distribution:

gurita swarm -y age -x class --logy < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, with the Y axis in log scale

Axis range limits

--xlim LOW HIGH
--ylim LOW HIGH

The range of displayed numerical distributions can be restricted with --xlim and --ylim. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of the range to be displayed.

It only makes sense to range-limit the numerical axis (and not the categorical axis). Therefore, --xlim should be used when numerical columns are selected with -x, and conversely, --ylim should be used when numerical columns are selected with -y.

For example, you can display range-limited range for the age column grouped by class (when the distribution of age is displayed on the Y axis) like so. Note carefully that the numerical data is displayed on the Y-axis (-y), therefore the --ylim argument should be used to range-limit the distribution:

gurita swarm -y age -x class --ylim 10 30 < titanic.csv

Swarm plot showing the distribution of age for each class in the titanic data set, with the Y axis limited to values in the range 10 to 30 inclusive

swarm

Usage

Arguments

See also

Simple example

Getting help

Selecting columns to plot

Controlling the order of the swarms

Colour and/or group columns with hue

Log scale

Axis range limits

Facets