Results interpretation of AgglomerativeClustering labelling

Question

First of all I would like to say that I'm quite new to python and even more new to scikit, and I'm also a self learner, so please forgive my banal question, but it doesn't look banal to me.
So, I have the following cosine similarity matrix as a DataFrame:
       m1     m2     m3     m4     m5
m1  1.000  0.179  0.775  0.673  0.544
m2  0.299  1.000  0.333  0.521  0.232
m3  0.656  0.440  1.000  0.444  0.722
m4  0.578  0.154  0.623  1.000  0.891
m5  0.345  0.312  0.722  0.221  1.000

I want to get all the clustering operations of the dendrogram. To accomplish that, I created this function:
from sklearn.cluster import AgglomerativeClustering
import numpy as np
import pandas as pd

def clusters(sim, link_name):

clusters_num = len(sim.columns) - 1

clusters_collection = []
    while clusters_num >= 1:
        clusters = AgglomerativeClustering(n_clusters=clusters_num, affinity='cosine', linkage=link_name).fit_predict(sim)
        clusters_collection.append(clusters)
        clusters_num = clusters_num - 1

return clusters_collection

sim_matrix = pd.read_excel(r'C:UsersdamiaOneDriveDesktoplogistic management toolEs sim asimmetricasim asimmetrica.xlsx')
sim_matrix.index = sim_matrix.columns
print(sim_matrix)

print(clusters(sim_matrix, 'average'))

The results are the following:
       m1     m2     m3     m4     m5
m1  1.000  0.179  0.775  0.673  0.544
m2  0.299  1.000  0.333  0.521  0.232
m3  0.656  0.440  1.000  0.444  0.722
m4  0.578  0.154  0.623  1.000  0.891
m5  0.345  0.312  0.722  0.221  1.000
[array([0, 3, 0, 1, 2], dtype=int64), array([0, 1, 0, 0, 2], dtype=int64), array([0, 1, 0, 0, 0], dtype=int64), array([0, 0, 0, 0, 0], dtype=int64)]

So apparently it groups m1 and m3 as a first move, but I was expecting it to group m4 and m5 because they have the highest similarity value (0.891).
I've done this exercise on paper before and the correct grouping order with average linkage should be:

m4 + m5
m1 + m3
m1 + m3 + m4 + m5
all together

Results interpretation of AgglomerativeClustering labelling

Add your own answers!

Ask a Question