bar
Bar plots summarise a numerical column as boxes with optional error bars.
By default the numerical column is summarised by its mean, but other summary functions can be chosen.
Usage
usage: gurita bar [-h] [-x COLUMN] [-y COLUMN] ... other arugments ...
Arguments
Argument |
Description |
Reference |
---|---|---|
|
display help for this command |
|
|
select column for the X axis |
|
|
select column for the Y axis |
|
|
Orientation of plot. Allowed values: v = vertical, h = horizontal. Default: v. |
|
|
Function to compute point estimate of numerical column |
|
|
show standard deviation of numerical column as error bar |
|
|
Show confidence interval as error bar to estimate uncertainty of point estimate |
|
|
controlling the order of the plotted bars |
|
|
colour and/or group columns by hue |
|
|
order of hue columns |
|
|
log scale X axis (only relevant with |
|
|
log scale Y axis |
|
|
range limit X axis |
|
|
range limit Y axis |
|
|
column to use for facet rows |
|
|
column to use for facet columns |
|
|
wrap the facet column at this width, to span multiple rows |
See also
Similar functionality to bar plots are provided by:
Bar plots are based on Seaborn’s catplot library function, using the kind="bar"
option.
Simple example
Bar plot the mean age
of passengers for each value of class
in the titanic.csv
input file:
gurita bar -y age -x class < titanic.csv
The output of the above command is written to bar.class.age.png
:
Getting help
The full set of command line arguments for bar plots can be obtained with the -h
or --help
arguments:
gurita bar -h
Selecting columns to plot
-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN
Bar plots can be plotted for numerical columns and optionally grouped by categorical columns.
If no categorical column is specified, a single column bar plot will be generated showing a summary (mean by default) of the numerical column.
Note
By default the orientation of the bar plot is vertical. In this scenario
the numerical column is specified by -y
, and the (optional) categorical column is specified
by -x
.
However, the orientation of the bar plot can be made horizontal using the --orient h
argument.
In this case the sense of the X and Y axes are swapped from the default, and thus
the numerical column is specified by -x
, and the (optional) categorical column is specified
by -y
.
In the following example the mean of age
is shown for each value in the class
column,
where the boxes are plotted horizontally:
gurita bar -x age -y class --orient h < titanic.csv
Summary function
By default bar plots show the mean of the selected numerical column. However alternative functions
can be chosen using the --estimator
argument.
The allowed choices are: mean
, median
, max
, min
, sum
, std
(standard deviation), var
(variance).
For example, the maximum age
is shown for each value of class
:
gurita bar -y age -x class --estimator max < titanic.csv
Standard deviaiton
The standard deviation of the numerical column can be shown as an error bar with the --sd
argument.
For example the mean and standard deviation of age
is shown for each value in the class
column:
By default, if --sd
is specified without a numerical argument, then +/- 1 standard deviation from the mean is shown, but this can be changed by supplying a specific numeric value.
gurita bar -y age -x class --sd < titanic.csv
Confidence interval
The confidence interval of the summary estimate can be shown as an error bar with the --ci
argument.
By default, if --ci
is specified without a numerical argument, then the 95% confidence interval is shown, but this can be changed by supplying a specific numeric value.
For example the mean of age and its 98% confidence interval is shown for each value in the class
column:
gurita bar -y age -x class --ci 98 < titanic.csv
Controlling the order of the bars
--order VALUE [VALUE...]
By default the order of the categorical columns displayed in the bar plot is determined from their occurrence in the input data.
This can be overridden with the --order
argument, which allows you to specify the exact ordering of columns based on their values.
In the following example the bar columns of the class
column are displayed in the order of First
, Second
, Third
:
gurita bar -y age -x class --order First Second Third < titanic.csv
Colour and/or group columns with hue
--hue COLUMN
Each bar can be coloured and optionally subdivided into additional categories with the --hue
argument.
The following example generates a bar plot showing the mean age of titanic passengers across the three different ticket classes, where each class is coloured differently:
gurita bar -y age -x class --hue class < titanic.csv
In the following example the mean and error of age
is shown for each value in the class
column, and further sub-divided by the sex
column:
gurita bar -y age -x class --hue sex < titanic.csv
By default the order of the columns within each hue group is determined from their occurrence in the input data.
This can be overridden with the --hueorder
argument, which allows you to specify the exact ordering of columns within each hue group, based on their values.
In the following example the sex
values are displayed in the order of female
, male
:
gurita bar -y age -x class --hue sex --hueorder female male < titanic.csv
It is also possible to use both --order
and --hueorder
in the same command. For example, the following command controls
the order of both the class
and sex
categorical columns:
gurita bar -y age -x class --order First Second Third --hue sex --hueorder female male < titanic.csv
Log scale
--logx
--logy
The mean of numerical values can be displayed in log (base 10) scale with --logx
and --logy
.
It only makes sense to log-scale the numerical axis (and not the categorical axis). Therefore, --logx
should be used when numerical columns are selected with -x
, and
conversely, --logy
should be used when numerical columns are selected with -y
.
For example, you can display a log scale bar plot for the age
column grouped by class
(when the mean of age
is displayed on the Y axis) like so. Note carefully that the numerical data is displayed on the Y-axis (-y
), therefore the --logy
argument should be used to log-scale the numerical mean:
gurita bar -y age -x class --logy < titanic.csv
Axis range limits
--xlim LOW HIGH
--ylim LOW HIGH
The range of displayed numerical columns can be restricted with --xlim
and --ylim
. Each of these flags takes two numerical values as arguments that represent the lower and upper bounds of the range to be displayed.
It only makes sense to range-limit the numerical axis (and not the categorical axis). Therefore, --xlim
should be used when numerical columns are selected with -x
, and
conversely, --ylim
should be used when numerical columns are selected with -y
.
For example, you can display range-limited range for the age
column grouped by class
(when age
is displayed on the Y axis) like so.
Note carefully that the numerical
data is displayed on the Y-axis (-y
), therefore the --ylim
argument should be used to range-limit the mean:
gurita bar -y age -x class --ylim 10 30 < titanic.csv
Facets
--frow COLUMN
--fcol COLUMN
--fcolwrap INT
Bar plots can be further divided into facets, generating a matrix of bar plots, where a numerical value is further categorised by up to 2 more categorical columns.
See the facet documentation for more information on this feature.
The follow command creates a faceted bar plot where the sex
column is used to determine the facet columns:
gurita bar -y age -x class --fcol sex < titanic.csv