Additional methods¶

Note

You can try this notebook in you browser:

This notebooks provides an overview of built-in clustering performance evaluation, ways of accessing individual labels resulting from clustering and saving the object to disk.

Clustering performance evaluation¶

Clustergam includes handy wrappers around a selection of clustering performance metrics offered by scikit-learn. Data which were originally computed on GPU are converted to numpy on the fly.

Let’s load the data and fit clustergram on Palmer penguins dataset. See the Introduction for its overview.

import seaborn
from sklearn.preprocessing import scale
from clustergram import Clustergram

seaborn.set(style='whitegrid')

df = seaborn.load_dataset('penguins')
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())

cgram = Clustergram(range(1, 12), verbose=False)
cgram.fit(data)

Silhouette score¶

Compute the mean Silhouette Coefficient of all samples. See scikit-learn documentation for details.

cgram.silhouette_score()

   0.531540
   0.447219
   0.399584
   0.377720
   0.368665
   0.335069
   0.286170
   0.285263
  0.279539
  0.273914
Name: silhouette_score, dtype: float64

Once computed, resulting Series is available as cgram.silhouette. Calling the original method will recompute the score.

cgram.silhouette.plot()

<AxesSubplot:>

Calinski and Harabasz score¶

Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See scikit-learn documentation for details.

cgram.calinski_harabasz_score()

   482.191469
   441.677075
   400.410025
   411.175066
   382.297175
   352.713169
   331.796377
   315.551827
  298.178981
  286.976897
Name: calinski_harabasz_score, dtype: float64

Once computed, resulting Series is available as cgram.calinski_harabasz. Calling the original method will recompute the score.

cgram.calinski_harabasz.plot()

<AxesSubplot:>

Davies-Bouldin score¶

Compute the Davies-Bouldin score. See scikit-learn documentation for details.

cgram.davies_bouldin_score()

   0.714064
   0.943553
   0.944215
   0.973248
   0.976604
   1.036676
   1.175931
   1.240331
  1.201865
  1.239891
Name: davies_bouldin_score, dtype: float64

Once computed, resulting Series is available as cgram.davies_bouldin. Calling the original method will recompute the score.

cgram.davies_bouldin.plot()

<AxesSubplot:>

Acessing labels¶

Clustergram stores resulting labels for each of the tested options, which can be accessed as:

cgram.labels

	1	2	3	4	5	6	7	8	9	10	11
0	0	1	0	3	4	5	1	1	3	0	5
1	0	1	0	3	4	5	1	1	3	4	1
2	0	1	0	3	4	5	1	1	3	4	1
3	0	1	0	3	4	5	1	1	4	0	5
4	0	1	0	3	1	0	4	4	4	7	4
...	...	...	...	...	...	...	...	...	...	...	...
337	0	0	1	1	0	1	3	0	8	8	0
338	0	0	1	1	0	1	3	0	8	8	0
339	0	0	1	2	3	3	0	3	5	3	3
340	0	0	1	1	0	1	6	6	2	1	7
341	0	0	1	2	3	3	6	6	2	1	7

342 rows × 11 columns

Saving clustergram¶

If you want to save your computed clustergram.Clustergram object to a disk, you can use pickle library:

import pickle

with open('clustergram.pickle','wb') as f:
    pickle.dump(cgram, f)

with open('clustergram.pickle','rb') as f:
    loaded = pickle.load(f)

Clustering methods

API reference

	1	2	3	4	5	6	7	8	9	10	11
0	0	1	0	3	4	5	1	1	3	0	5
1	0	1	0	3	4	5	1	1	3	4	1
2	0	1	0	3	4	5	1	1	3	4	1
3	0	1	0	3	4	5	1	1	4	0	5
4	0	1	0	3	1	0	4	4	4	7	4
...	...	...	...	...	...	...	...	...	...	...	...
337	0	0	1	1	0	1	3	0	8	8	0
338	0	0	1	1	0	1	3	0	8	8	0
339	0	0	1	2	3	3	0	3	5	3	3
340	0	0	1	1	0	1	6	6	2	1	7
341	0	0	1	2	3	3	6	6	2	1	7

	1	2	3	4	5	6	7	8	9	10	11
0	0	1	0	3	4	5	1	1	3	0	5
1	0	1	0	3	4	5	1	1	3	4	1
2	0	1	0	3	4	5	1	1	3	4	1
3	0	1	0	3	4	5	1	1	4	0	5
4	0	1	0	3	1	0	4	4	4	7	4
...	...	...	...	...	...	...	...	...	...	...	...
337	0	0	1	1	0	1	3	0	8	8	0
338	0	0	1	1	0	1	3	0	8	8	0
339	0	0	1	2	3	3	0	3	5	3	3
340	0	0	1	1	0	1	6	6	2	1	7
341	0	0	1	2	3	3	6	6	2	1	7

	1	2	3	4	5	6	7	8	9	10	11
0	0	1	0	3	4	5	1	1	3	0	5
1	0	1	0	3	4	5	1	1	3	4	1
2	0	1	0	3	4	5	1	1	3	4	1
3	0	1	0	3	4	5	1	1	4	0	5
4	0	1	0	3	1	0	4	4	4	7	4
...	...	...	...	...	...	...	...	...	...	...	...
337	0	0	1	1	0	1	3	0	8	8	0
338	0	0	1	1	0	1	3	0	8	8	0
339	0	0	1	2	3	3	0	3	5	3	3
340	0	0	1	1	0	1	6	6	2	1	7
341	0	0	1	2	3	3	6	6	2	1	7