Plotting options#

Clustergram offers two types of plots - static and interactive. Static plots are using matplotlib, while interactive are based on bokeh.

Backends#

Let’s load the data and fit clustergram on Palmer penguins dataset. See the Introduction for its overview.

import seaborn
from sklearn.preprocessing import scale
from clustergram import Clustergram

df = seaborn.load_dataset('penguins')
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())

cgram = Clustergram(range(1, 12), n_init=10, verbose=False)
cgram.fit(data)
Clustergram(k_range=range(1, 12), backend='sklearn', method='kmeans', kwargs={'n_init': 10})

Static plots#

Static plots can be generated using Clustergram.plot() method.

cgram.plot();
../_images/7d6d9970527343ed20caaedd74773d70130a9b5cd3efbf3b49377d359ae4888b.png

Styling#

Clustergram.plot() returns matplotlib axis and can be fully customised as any other matplotlib plot. You can pass keyword arguments to control the style of cluster centers as a cluster_style dictionary and arguments to control lines using line_style dictionary. Using global styles like those in seaborn also works.

seaborn.set(style='whitegrid')

cgram.plot(
    size=0.5,
    linewidth=0.5,
    cluster_style={"color": "lightblue", "edgecolor": "black"},
    line_style={"color": "red", "linestyle": "-."},
    figsize=(12, 8)
);
../_images/3d116c445741d1f6d5e2a058b1f07b04b94c86f05c74b7fe26eefabed61c63b6.png

Partial plot#

Clustergram.plot() can also plot only a part of the diagram, if you want to focus on a limited range of k.

cgram.plot(k_range=range(3, 10), figsize=(12, 8));
../_images/d7506deafd66ef138d457bd48faed61ea873778b70b124de6620b5a9279de275.png

Saving plot#

Clustergram.plot() returns matplotlib axis object and as such can be saved as any other plot:

import matplotlib.pyplot as plt

cgram.plot()
plt.savefig('clustergram.svg')
../_images/46de41cbbbb9d6e0c2a1abcc82c648b7cf44eb88efaf1281080c9a79f5bc39ab.png

Interactive plots#

Interactive plots can be generated using Clustergram.bokeh() method. The method returs bokeh Figure object and it is up to you what to do with it. Probably the best option, if you are using Jupyter notebook, is to show it directly in the cell. For that you will need to load BokehJS interface using output_notebook() and then call show().

You need to install bokeh, which is an optional dependency only:

conda install bokeh

Or

pip install bokeh
from bokeh.io import output_notebook
from bokeh.plotting import show

output_notebook()
Loading BokehJS ...
fig = cgram.bokeh()
show(fig)

This clustegram allows you to interactively zoom to specific parts of the diagram and also shows the number of observations per each cluster alongside its label. You can retrieve labels for each observation and each iteration using Clustergram.labels.

Styling#

Bokeh plot can be customised in a very similar way as the static one, using style dictionaries.

fig = cgram.bokeh(
    size=0.5,
    line_width=0.5,
    cluster_style={"color": "lightblue", "line_color": "black", },
    line_style={"color": "red", "line_dash": "dotted", "line_cap": "butt"},
    figsize=(700, 500)
)
show(fig)

Saving#

You can also save Bokeh plot as HTML to retain its interactivity using save() instead of show().

from bokeh.plotting import output_file, save

output_file("clustergram.html")
save(fig)
Hide code cell output
'/home/docs/checkouts/readthedocs.org/user_builds/clustergram/checkouts/latest/doc/notebooks/clustergram.html'

Mean options#

On the y axis, a clustergram can use mean values as in the original paper by Matthias Schonlau or PCA weighted mean values as in the implementation by Tal Galili. PCA weighted plots are default as they help distinguishing between different branches and make interpretation a bit easier. The same option is supported by both plotting backends.

cgram.plot(figsize=(12, 8), pca_weighted=True);
../_images/ca5e4daa127d0e21095af74537d9e7ff6388dcf8ff95fecc4e19a6b4ba48c6d4.png
cgram.plot(figsize=(12, 8), pca_weighted=False);
../_images/0575c49aa071f6188da8375d8d84775eb416b37d39774d0c29493a1d18bbab1a.png

By default, PCA-weighted clustergrams are weighted using the first principal component but nothing stops you from using any other.

cgram.plot(figsize=(12, 8), pca_weighted=True, pca_component=2);
../_images/b152e9db9427789a5a270f161d99ca82f3985d8dfced4adf9ee8d233e9086ee6.png