Plotting options

Note

You can try this notebook in you browser: Binder

Backends

Clustergram offers two types of plots - static and interactive. Static plots are using matplotlib, while interactive are based on bokeh.

Let’s load the data and fit clustergram on Palmer penguins dataset. See the Introduction for its overview.

import seaborn
from sklearn.preprocessing import scale
from clustergram import Clustergram

df = seaborn.load_dataset('penguins')
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())

cgram = Clustergram(range(1, 12), verbose=False)
cgram.fit(data)

Static plots

Static plots can be generated using Clustergram.plot() method.

cgram.plot()
<AxesSubplot:xlabel='Number of clusters (k)', ylabel='PCA weighted mean of the clusters'>
../_images/plotting_3_1.png

Styling

Clustergram.plot() returns matplotlib axis and can be fully customised as any other matplotlib plot. You can pass keyword arguments to control the style of cluster centers as a cluster_style dictionary and arguments to control lines using line_style dictionary. Using global styles like those in seaborn also works.

seaborn.set(style='whitegrid')

cgram.plot(
    size=0.5,
    linewidth=0.5,
    cluster_style={"color": "lightblue", "edgecolor": "black"},
    line_style={"color": "red", "linestyle": "-."},
    figsize=(12, 8)
)
<AxesSubplot:xlabel='Number of clusters (k)', ylabel='PCA weighted mean of the clusters'>
../_images/plotting_5_1.png

Partial plot

Clustergram.plot() can also plot only a part of the diagram, if you want to focus on a limited range of k.

cgram.plot(k_range=range(3, 10), figsize=(12, 8))
<AxesSubplot:xlabel='Number of clusters (k)', ylabel='PCA weighted mean of the clusters'>
../_images/plotting_7_1.png

Saving plot

Clustergram.plot() returns matplotlib axis object and as such can be saved as any other plot:

import matplotlib.pyplot as plt

cgram.plot()
plt.savefig('clustergram.svg')
../_images/plotting_9_0.png

Interactive plots

Interactive plots can be generated using Clustergram.bokeh() method. The method returs bokeh Figure object and it is up to you what to do with it. Probably the best option, if you are using Jupyter notebook, is to show it directly in the cell. For that you will need to load BokehJS interface using output_notebook() and then call show().

You need to install bokeh, which is an optional dependency only:

conda install bokeh

Or

pip install bokeh
from bokeh.io import output_notebook
from bokeh.plotting import show

output_notebook()
Loading BokehJS ...
fig = cgram.bokeh()
show(fig)

This clustegram allows you to interactively zoom to specific parts of the diagram and also shows the number of observations per each cluster alongside its label. You can retrieve labels for each observation and each iteration using Clustergram.labels.

Styling

Bokeh plot can be customised in a very similar way as the static one, using style dictionaries.

fig = cgram.bokeh(
    size=0.5,
    line_width=0.5,
    cluster_style={"color": "lightblue", "line_color": "black", },
    line_style={"color": "red", "line_dash": "dotted", "line_cap": "butt"},
    figsize=(700, 500)
)
show(fig)

Saving

You can also save Bokeh plot as HTML to retain its interactivity using save() instead of show().

from bokeh.plotting import output_file, save

output_file("clustergram.html")
save(fig)
'/home/docs/checkouts/readthedocs.org/user_builds/clustergram/checkouts/v0.6.0/doc/notebooks/clustergram.html'

Mean options

On the y axis, a clustergram can use mean values as in the original paper by Matthias Schonlau or PCA weighted mean values as in the implementation by Tal Galili. PCA weighted plots are default as they help distinguishing between different branches and make interpretation a bit easier. The same option is supported by both plotting backends.

cgram.plot(figsize=(12, 8), pca_weighted=True)
<AxesSubplot:xlabel='Number of clusters (k)', ylabel='PCA weighted mean of the clusters'>
../_images/plotting_19_1.png
cgram.plot(figsize=(12, 8), pca_weighted=False)
<AxesSubplot:xlabel='Number of clusters (k)', ylabel='Mean of the clusters'>
../_images/plotting_20_1.png