pair

Plot relationships between pairs of columns on a grid of axes.

By default comparisons are made between all numerical columns in the data, however it is possible to specify which columns to compare, including categorical columns as well.

A histogram showing the distribution of each column is shown on the diagonal.

Usage

gurita pair [-h] [-c [COLUMN ...]] [--kind {scatter,kde,hist,reg}] ... other arguments ...

Arguments

Argument

Description

Reference

-h

display help for this command

help

  • -c [COLUMN ...]

  • --col [COLUMN ...]

select columns to compare

columns

--hue COLUMN

group data by a categorical column

hue

--corner

only plot the lower left corner of the grid

corner

--kind {scatter,kde,hist,reg}

choose the kind of plot (default: scatter)

kind

See also

Pair plots are based on Seaborn’s pairplot library function.

Simple example

The following example generates a pair plot comparing all the numerical columns in the iris.csv data set.

Note that there are four numerical columns in this data set (sepal_width, sepal_length, petal_length, petal_width) and one categorical column (species). By default, if no column names are specified in a pair plot, all the numerical columns will be compared (and catergorical columns are ignored). This behaviour can be overridden with the -c, --col argument.

gurita pair < iris.csv

The output of the above command is written to pair.png:

Pair plot comparing all the numerical columns in the iris.csv data set

Getting help

The full set of command line arguments for pair plots can be obtained with the -h or --help arguments:

gurita pair -h

Selecting columns to compare

-c [COLUMN ...], --col [COLUMN ...]

By default, if no column names are specified in a pair plot, all the numerical columns will be compared (and catergorical columns are ignored). This behaviour can be overridden with the -c (or --col) argument.

The following example generates a pair plot comparing the sepal_length, petal_length, and species columns. Note that species is a categorical column, and it would not be plotted by default.

gurita pair -c sepal_length petal_length species < iris.csv
Pair plot comparing various specific columns in the iris.csv data set

Unfortunately, due to the small size of the above plot, the axes labels for species on the X axis overlap one another. This can be avoided by rotating the X axis tick labels by 90 degrees using --rxtl 90:

gurita pair -c sepal_length petal_length species --rxtl 90 < iris.csv
Pair plot comparing various specific columns in the iris.csv data set, with X axis labels rotated 90 degrees to avoid overlapping

Group data by a categorical column with hue

--hue COLUMN

The plotted data can be grouped by a categorical column, where each group is rendered as a different colour.

In the example below, no column names are specified, so only the numerical columns are plotted.

gurita pair --hue species < iris.csv
Pair plot comparing all the numerical columns in the iris.csv data set, grouped by species

Only plot the lower left corner of the grid

--corner

By default a pair plot shows a full square grid of plots for each pairwise comparison. In this sense the top right and bottom left triangles of the plot are reflected mirror images. An alternative is to plot only the bottom left corner of the grid using the --corner argument.

gurita pair --corner < iris.csv
Pair plot comparing all the numerical columns in the iris.csv data set, showing only the lower left corner

Choose the kind of plot

--kind {scatter,kde,hist,reg}

By default pair plots use a scatter plot to compare two numerical columns. This can be changed with the --kind argument, which allows you to choose from four plot types:

  1. scatter (the default)

  2. kde kernel density estimate

  3. hist histogram

  4. reg regression

The example below shows a pair plot using kde (kernel density estimate) as the method of comparison:

gurita pair --kind kde < iris.csv
Pair plot comparing all the numerical columns in the iris.csv data set, using a kde to compare columns

The example below shows a pair plot using hist (histogram) as the method of comparison:

gurita pair --kind hist < iris.csv
Pair plot comparing all the numerical columns in the iris.csv data set, using a histogram to compare columns

The example below shows a pair plot using reg (regression) as the method of comparison:

gurita pair --kind reg < iris.csv
Pair plot comparing all the numerical columns in the iris.csv data set, using a regression to compare columns