bar

Bar plots summarise a numerical column as boxes with optional error bars.

By default the numerical column is summarised by its mean, but other summary functions can be chosen.

Usage

usage: gurita bar [-h] [-x COLUMN] [-y COLUMN] ... other arugments ...

Arguments

Argument	Description	Reference
`-h`	display help for this command	help
`-x COLUMN` `--xaxis COLUMN`	select column for the X axis	X axis
`-y COLUMN` `--yaxis COLUMN`	select column for the Y axis	Y axis
`--orient {v,h}`	Orientation of plot. Allowed values: v = vertical, h = horizontal. Default: v.	orient
`--estimator {mean, median, max, min, sum, std, var}`	Function to compute point estimate of numerical column	estimator
`--sd [NUM]`	show standard deviation of numerical column as error bar	standard deviation error bar
`--ci [NUM]`	Show confidence interval as error bar to estimate uncertainty of point estimate	confidence interval error bar
`--order VALUE [VALUE ...]`	controlling the order of the plotted bars	order
`--hue COLUMN`	colour and/or group columns by hue	hue
`--hueorder VALUE [VALUE ...]`	order of hue columns	hue order
`--logx`	log scale X axis (only relevant with `--orient h`)	log X axis
`--logy`	log scale Y axis	log Y axis
`--xlim BOUND BOUND`	range limit X axis	limit X axis
`--ylim BOUND BOUND`	range limit Y axis	limit Y axis
`--frow COLUMN`	column to use for facet rows	facet rows
`--fcol COLUMN`	column to use for facet columns	facet columns
`--fcolwrap INT`	wrap the facet column at this width, to span multiple rows	facet wrap

Simple example

Bar plot the mean age of passengers for each value of class in the titanic.csv input file:

gurita bar -y age -x class < titanic.csv

The output of the above command is written to bar.class.age.png:

Bar plot showing the mean of age for each class in the titanic data set

Getting help

The full set of command line arguments for bar plots can be obtained with the -h or --help arguments:

gurita bar -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

Bar plots can be plotted for numerical columns and optionally grouped by categorical columns.

If no categorical column is specified, a single column bar plot will be generated showing a summary (mean by default) of the numerical column.

Note

By default the orientation of the bar plot is vertical. In this scenario the numerical column is specified by -y, and the (optional) categorical column is specified by -x.

However, the orientation of the bar plot can be made horizontal using the --orient h argument. In this case the sense of the X and Y axes are swapped from the default, and thus the numerical column is specified by -x, and the (optional) categorical column is specified by -y.

In the following example the mean of age is shown for each value in the class column, where the boxes are plotted horizontally:

gurita bar -x age -y class --orient h < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, shown horizontally

Summary function

By default bar plots show the mean of the selected numerical column. However alternative functions can be chosen using the --estimator argument.

The allowed choices are: mean, median, max, min, sum, std (standard deviation), var (variance).

For example, the maximum age is shown for each value of class:

gurita bar -y age -x class --estimator max < titanic.csv

Bar plot showing the maximum age for each class in the titanic data set

Standard deviaiton

The standard deviation of the numerical column can be shown as an error bar with the --sd argument.

For example the mean and standard deviation of age is shown for each value in the class column:

By default, if --sd is specified without a numerical argument, then +/- 1 standard deviation from the mean is shown, but this can be changed by supplying a specific numeric value.

gurita bar -y age -x class --sd < titanic.csv

Confidence interval

The confidence interval of the summary estimate can be shown as an error bar with the --ci argument.

By default, if --ci is specified without a numerical argument, then the 95% confidence interval is shown, but this can be changed by supplying a specific numeric value.

For example the mean of age and its 98% confidence interval is shown for each value in the class column:

gurita bar -y age -x class --ci 98 < titanic.csv

Bar plot showing the mean of age and 98% confidence interval for each class in the titanic data set

Controlling the order of the bars

--order VALUE [VALUE...]

By default the order of the categorical columns displayed in the bar plot is determined from their occurrence in the input data. This can be overridden with the --order argument, which allows you to specify the exact ordering of columns based on their values.

In the following example the bar columns of the class column are displayed in the order of First, Second, Third:

gurita bar -y age -x class --order First Second Third < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, shown in a specified order

Colour and/or group columns with hue

--hue COLUMN

Each bar can be coloured and optionally subdivided into additional categories with the --hue argument.

The following example generates a bar plot showing the mean age of titanic passengers across the three different ticket classes, where each class is coloured differently:

gurita bar -y age -x class --hue class < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, grouped by class and sex

In the following example the mean and error of age is shown for each value in the class column, and further sub-divided by the sex column:

gurita bar -y age -x class --hue sex < titanic.csv

By default the order of the columns within each hue group is determined from their occurrence in the input data. This can be overridden with the --hueorder argument, which allows you to specify the exact ordering of columns within each hue group, based on their values.

In the following example the sex values are displayed in the order of female, male:

gurita bar -y age -x class --hue sex --hueorder female male < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, grouped by class and sex, with sex order specified

It is also possible to use both --order and --hueorder in the same command. For example, the following command controls the order of both the class and sex categorical columns:

gurita bar -y age -x class --order First Second Third --hue sex --hueorder female male < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, grouped by class and sex, with class order and sex order specified

Log scale

--logx
--logy

The mean of numerical values can be displayed in log (base 10) scale with --logx and --logy.

It only makes sense to log-scale the numerical axis (and not the categorical axis). Therefore, --logx should be used when numerical columns are selected with -x, and conversely, --logy should be used when numerical columns are selected with -y.

For example, you can display a log scale bar plot for the age column grouped by class (when the mean of age is displayed on the Y axis) like so. Note carefully that the numerical data is displayed on the Y-axis (-y), therefore the --logy argument should be used to log-scale the numerical mean:

gurita bar -y age -x class --logy < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, with the Y axis plotted in log scale

Axis range limits

--xlim LOW HIGH
--ylim LOW HIGH

The range of displayed numerical columns can be restricted with --xlim and --ylim. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of the range to be displayed.

It only makes sense to range-limit the numerical axis (and not the categorical axis). Therefore, --xlim should be used when numerical columns are selected with -x, and conversely, --ylim should be used when numerical columns are selected with -y.

For example, you can display range-limited range for the age column grouped by class (when age is displayed on the Y axis) like so. Note carefully that the numerical data is displayed on the Y-axis (-y), therefore the --ylim argument should be used to range-limit the mean:

gurita bar -y age -x class --ylim 10 30 < titanic.csv

Bar plot showing the mean of age for each class in the titanic data set, with the Y axis limited to between 10 to 30