heatmap

Heatmap showing the relationship between two categorical columns and a numerical column.

Usage

gurita heatmap [-h] -x COLUMN -y COLUMN -v COLUMN ... other arguments ...

Arguments

Argument	Description	Reference
`-h`	display help	help
`-x COLUMN` `--xaxis COLUMN`	select categorial column for the X axis	X axis
`-y COLUMN` `--yaxis COLUMN`	select categorical column for the Y axis	Y axis
`-v COLUMN` `--val COLUMN`	select intensity value for heatmap	value
`--cmap COLOR_MAP_NAME`	colour map for the heat map	colour map
`--annot [FORMAT]`	show the value as text in cells	annotate
`--vmin NUM`	minimum anchor value for the colormap	minimum colormap value
`--vmax NUM`	maximum anchor value for the colormap	maximum colormap value
`--robust`	use robust quantiles to set colormap range	robust quantiles
`--sortx [{a,d}]]`	sort the X axis by value, allowed values: a, d. a=ascending, d=descending, default: a.	sort X axis
`--sorty [{a,d}]]`	sort the Y axis by value, allowed values: a, d. a=ascending, d=descending, default: a.	sort Y axis
`--orderx VALUE [VALUE ...]`	order the X axis according to a given list of values	order X axis by value
`--ordery VALUE [VALUE ...]`	order the Y axis according to a given list of values	order Y axis by value

Simple example

Heatmap showing the number of passengers by month and year in the flights.csv data set:

gurita heatmap -y year -x month -v passengers < flights.csv

The output of the above command is written to heatmap.month.year.png:

Heatmap showing the number of passengers by month and year in the flights.csv data set

Getting help

The full set of command line arguments for heatmap plots can be obtained with the -h or --help arguments:

gurita heatmap -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

The X and Y axes of a heatmap must be categorical columns. The data must be formatted such that in each row the pair of values (X, Y) is unique (not repeated). If your data is not in this format it may be possible to transform it into this format using pivot.

The example below shows the same heatmap the simple example above but with the month on the Y axis and the year on the X axis:

gurita heatmap -y month -x year -v passengers < flights.csv

Colour map

--cmap COLOR_MAP_NAME

The colour map used in the heatmap can be set explicitly using --cmap with the name of the colour map as its argument.

Gurita uses Matlplotlib’s colour map names (because Gurita uses Seaborn to draw that heatmap, and Seaborn is built on top of Matplotlib).

The example below uses the YlOrRd (yellow-orange-red) colour map:

gurita heatmap -y year -x month -v passengers --cmap YlOrRd < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, using the YlOrRd colour map

Show the value as text in each cell

--annot [FORMAT]

The --annot option will display the numerical value as text in each cell of the heatmap. The optional argument FORMAT is a string that specifies how to display the numeric value as text. The format string uses Python’s format specification language. It defaults to d which displays the value as a decimal integer.

For real numbers (floating point) you may want to use a format like .2g which will display the number in scientific notation with 2 decimal places.

gurita heatmap -y year -x month -v passengers --annot < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, with the numeric value in each cell shown as text

Control the range of values used in the colour map

--vmin NUM
--vmax NUM
--robust

The upper and lower bounds of the values displayed in the heatmap are chosen from the data by default, but they can be ajusted with --vmin and --vmax, setting the lower and upper bounds respectively. It is possible to set one or both bounds at the same time.

In the example below the lower bound is set to 250 and the upper bound is set to 550. Values outside these bounds are clamped to the bounding values.

We observe that in this example data set it wasn’t until the early 1950s that the number of passengers per flight exceeded 250, hence the predominance of black cells in the top part of the plot.

gurita heatmap -y year -x month -v passengers --vmin 250 --vmax 550 < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, with the minimum and maximum range of values specified.

Alternatively, the --robust argument will cause the maximum and minimum values to be chosen based on quantiles, which can be desirable when extreme outliers occur in the data. Note that --robust may not be used at the same time as --vmin and/or --vmax.

gurita heatmap -y year -x month -v passengers --robust < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, with the quantiles used to determine the minimum and maximum values.

Control the order of the columns and rows

--sortx [{a,d}]]
--sorty [{a,d}]]
--orderx VALUE [VALUE]
--ordery VALUE [VALUE]

The default ordering of values on the X and Y axis is determined by their relative order in the input data. In many cases this is not the best order to display in the heatmap.

Therefore the order of the values on the axes can be either sorted, using --sortx and --sorty, or manually specified using --orderx and --ordery.

Both sort arguments accept an optional argument that specifies the direction of the sort: a for ascending and d for descending, where the order of rows is considered from top to bottom and the order of columns is considered from left to right.

Categorical columns will be sorted alphabetically. Numerical columns will be sorted numerically.

If a specific order of values is required then this can be achived with --orderx and --ordery. Both of these arguments require one or more values to be specified, though it is possible to specify only a subset of all the possible values. Any unlisted values will be ordered arbitrarily. This can be useful when the relative order of only a few values is important.

The example below generates a heatmap with the values on the Y axes displayed in descending sorted order:

gurita heatmap -y year -x month -v passengers --sorty d < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, with the values on the Y axis in sorted descending order

The example below generates a heatmap with the first four values on the X axis shown in a specific order, namely: January, February, March, April. Note that the complete ordering of the twelve possible months is not specified. Thus the last eight months are shown in an arbitary order. If we wanted to specifiy the full order then the first eleven months would need to be specified.

gurita heatmap -y year -x month -v passengers --orderx January February March April < flights.csv

Heatmap showing the number of passengers by month and year in the flights.csv data set, with the values on the X axis specified in a particular partial order