heatmap

Heatmap showing the relationship between two categorical columns and a numerical column.

Usage

gurita heatmap [-h] -x COLUMN -y COLUMN -v COLUMN ... other arguments ...

Arguments

Argument

Description

Reference

-h

display help

help

  • -x COLUMN

  • --xaxis COLUMN

select categorial column for the X axis

X axis

  • -y COLUMN

  • --yaxis COLUMN

select categorical column for the Y axis

Y axis

  • -v COLUMN

  • --val COLUMN

select intensity value for heatmap

value

--cmap COLOR_MAP_NAME

colour map for the heat map

colour map

--annot [FORMAT]

show the value as text in cells

annotate

--vmin NUM

minimum anchor value for the colormap

minimum colormap value

--vmax NUM

maximum anchor value for the colormap

maximum colormap value

--robust

use robust quantiles to set colormap range

robust quantiles

--sortx [{a,d}]]

sort the X axis by value, allowed values: a, d. a=ascending, d=descending, default: a.

sort X axis

--sorty [{a,d}]]

sort the Y axis by value, allowed values: a, d. a=ascending, d=descending, default: a.

sort Y axis

--orderx VALUE [VALUE ...]

order the X axis according to a given list of values

order X axis by value

--ordery VALUE [VALUE ...]

order the Y axis according to a given list of values

order Y axis by value

See also

Cluster maps combine heatmaps with clustering.

Heatmap plots are based on Seaborn’s heatmap library function.

Simple example

Heatmap showing the number of passengers by month and year in the flights.csv data set:

gurita heatmap -y year -x month -v passengers < flights.csv

The output of the above command is written to heatmap.month.year.png:

Heatmap showing the number of passengers by month and year in the flights.csv data set

Getting help

The full set of command line arguments for heatmap plots can be obtained with the -h or --help arguments:

gurita heatmap -h

Selecting columns to plot

-x COLUMN, --xaxis COLUMN
-y COLUMN, --yaxis COLUMN

The X and Y axes of a heatmap must be categorical columns. The data must be formatted such that in each row the pair of values (X, Y) is unique (not repeated). If your data is not in this format it may be possible to transform it into this format using pivot.

The example below shows the same heatmap the simple example above but with the month on the Y axis and the year on the X axis:

gurita heatmap -y month -x year -v passengers < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set

Colour map

--cmap COLOR_MAP_NAME

The colour map used in the heatmap can be set explicitly using --cmap with the name of the colour map as its argument.

Gurita uses Matlplotlib’s colour map names (because Gurita uses Seaborn to draw that heatmap, and Seaborn is built on top of Matplotlib).

The example below uses the YlOrRd (yellow-orange-red) colour map:

gurita heatmap -y year -x month -v passengers --cmap YlOrRd < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, using the YlOrRd colour map

Show the value as text in each cell

--annot [FORMAT]

The --annot option will display the numerical value as text in each cell of the heatmap. The optional argument FORMAT is a string that specifies how to display the numeric value as text. The format string uses Python’s format specification language. It defaults to d which displays the value as a decimal integer.

For real numbers (floating point) you may want to use a format like .2g which will display the number in scientific notation with 2 decimal places.

gurita heatmap -y year -x month -v passengers --annot < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, with the numeric value in each cell shown as text

Control the range of values used in the colour map

--vmin NUM
--vmax NUM
--robust

The upper and lower bounds of the values displayed in the heatmap are chosen from the data by default, but they can be ajusted with --vmin and --vmax, setting the lower and upper bounds respectively. It is possible to set one or both bounds at the same time.

In the example below the lower bound is set to 250 and the upper bound is set to 550. Values outside these bounds are clamped to the bounding values.

We observe that in this example data set it wasn’t until the early 1950s that the number of passengers per flight exceeded 250, hence the predominance of black cells in the top part of the plot.

gurita heatmap -y year -x month -v passengers --vmin 250 --vmax 550 < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, with the minimum and maximum range of values specified.

Alternatively, the --robust argument will cause the maximum and minimum values to be chosen based on quantiles, which can be desirable when extreme outliers occur in the data. Note that --robust may not be used at the same time as --vmin and/or --vmax.

gurita heatmap -y year -x month -v passengers --robust < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, with the quantiles used to determine the minimum and maximum values.

Control the order of the columns and rows

--sortx [{a,d}]]
--sorty [{a,d}]]
--orderx VALUE [VALUE]
--ordery VALUE [VALUE]

The default ordering of values on the X and Y axis is determined by their relative order in the input data. In many cases this is not the best order to display in the heatmap.

Therefore the order of the values on the axes can be either sorted, using --sortx and --sorty, or manually specified using --orderx and --ordery.

Both sort arguments accept an optional argument that specifies the direction of the sort: a for ascending and d for descending, where the order of rows is considered from top to bottom and the order of columns is considered from left to right.

Categorical columns will be sorted alphabetically. Numerical columns will be sorted numerically.

If a specific order of values is required then this can be achived with --orderx and --ordery. Both of these arguments require one or more values to be specified, though it is possible to specify only a subset of all the possible values. Any unlisted values will be ordered arbitrarily. This can be useful when the relative order of only a few values is important.

The example below generates a heatmap with the values on the Y axes displayed in descending sorted order:

gurita heatmap -y year -x month -v passengers --sorty d < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, with the values on the Y axis in sorted descending order

The example below generates a heatmap with the first four values on the X axis shown in a specific order, namely: January, February, March, April. Note that the complete ordering of the twelve possible months is not specified. Thus the last eight months are shown in an arbitary order. If we wanted to specifiy the full order then the first eleven months would need to be specified.

gurita heatmap -y year -x month -v passengers --orderx January February March April < flights.csv
Heatmap showing the number of passengers by month and year in the flights.csv data set, with the values on the X axis specified in a particular partial order