count

Count plots show the frequency of values within categorical columns using bars.

Usage

gurita count [-h] [-x COLUMN] [-y COLUMN] ... other arguments ...

Arguments

Argument	Description	Reference
`-h`	display help	help
`-x COLUMN` `--xaxis COLUMN`	select column for the X axis	X axis
`-y COLUMN` `--yaxis COLUMN`	select column for the Y axis	Y axis
`--order VALUE [VALUE ...]`	order of the plotted columns	order
`--hue COLUMN`	colour and/or group columns by hue	hue
`--hueorder VALUE [VALUE ...]`	order of hue columns	hue order
`--logx`	log scale X axis	log X axis
`--logy`	log scale Y axis	log Y axis
`--xlim BOUND BOUND`	range limit X axis	limit X axis
`--ylim BOUND BOUND`	range limit Y axis	limit Y axis
`--frow COLUMN`	column to use for facet rows	facet rows
`--fcol COLUMN`	column to use for facet columns	facet columns
`--fcolwrap INT`	wrap the facet column at this width, to span multiple rows	facet wrap

Simple example

Plot a count of the embark_town categorical column from the titanic.csv input file:

gurita count -x embark_town < titanic.csv

The output of the above command is written to count.embark_town.png:

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file

Getting help

The full set of command line arguments for count plots can be obtained with the -h or --help arguments:

gurita count -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

Count plots can be plotted for categorical columns.

Note

If a numerical column is selected for a count plot it will be treated as categorical, which may not give expected behaviour.

You may not use both -x COLUMN and -y COLUMN in the same command line for count plots.

You can select the column that you want to plot as a count using the -x (--xaxis) or -y (--yaxis) arguments.

If -x (--xaxis) is chosen the count columns will be plotted vertically.

If -y (--yaxis) is chosen the count columns will be plotted horizontally.

See the example above for a vertical axis plot. For comparison, the following command uses -y embark_town to plot a histogram of embark_town horizontally:

gurita count -y embark_town < titanic.csv

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file, plotted horizontally

Controlling the order of the bars

--order VALUE [VALUE ...]

By default the order of the categorical columns displayed in the count plot is determined from their occurrence in the input data. This can be overridden with the --order argument, which allows you to specify the exact ordering of columns based on their values.

In the following example the counts of the embark_town column are displayed in the order of Cherbourg, Queenstown, Southampton:

gurita count -x embark_town --order Cherbourg Queenstown Southampton < titanic.csv

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file, with specific order

Grouping columns with hue

--hue COLUMN

The column being counted can be coloured and/or grouped based on another categorical column using the --hue argument.

In the following example the counts of the embark_town are coloured separately:

gurita count -x embark_town --hue embark_town < titanic.csv

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file, coloured separately

In the following example the counts of the embark_town column are grouped by the class column from the titanic data set:

gurita count -x embark_town --hue class < titanic.csv

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file, grouped by the class column

By default the order of the columns within each hue group is determined from their occurrence in the input data. This can be overridden with the --hueorder argument, which allows you to specify the exact ordering of columns within each hue group, based on their values.

In the following example the class values are displayed in the order of First, Second, Third:

gurita count -x embark_town --hue class --hueorder First Second Third < titanic.csv

Count plot showing the frequency of the categorical values in the embark_town column from the titanic.csv file, grouped by the class column, displayed in a specified order

It is possible to use both --order and --hueorder in the same command. For example, the following command controls the order of both the embark_town and class categorical columns:

gurita count -x embark_town --hue class --order Cherbourg Queenstown Southampton \
            --hueorder First Second Third < titanic.csv

Count plot of embark_town showing grouping on town and on class, where the order of values is specified

Log scale

--logx
--logy

Count values can be displayed in log (base 10) scale with --logx and --logy.

It only makes sense to log-scale the count axis (and not the categorical axis). Therefore, --logx should be used when categorical columns are selected with -y, and conversely, --logy should be used when categorical columns are selected with -x.

For example, you can display a log scale of counts for the embark_town column (when the column is displayed on the X-axis) like so. Note carefully that the categorical data is displayed on the X-axis (-x), therefore the --logy argument should be used to log-scale the counts:

gurita count -x embark_town --logy < titanic.csv

Axis range limits

--xlim LOW HIGH
--ylim LOW HIGH

The range of displayed count values can be restricted with --xlim and --ylim. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of values to be displayed.

It only makes sense to range-limit the count axis (and not the categorical axis). Therefore, --xlim should be used when categorical columns are selected with -y, and conversely, --ylim should be used when categorical columns are selected with -x.

For example, you can display range-limited count for the embark_town column (when the column is displayed on the X-axis) like so. Note carefully that the categorical data is displayed on the X-axis (-x), therefore the --ylim argument should be used to range-limit the counts:

gurita count -x embark_town --ylim 100 300 < titanic.csv

Count plot showing the count of embarm_town in the titanic data set, with Y axis limited in range