Data Science Asked by dao.foa on July 2, 2021
First of all I would like to say that I’m quite new to python and even more new to scikit, and I’m also a self learner, so please forgive my banal question, but it doesn’t look banal to me.
So, I have the following cosine similarity matrix as a DataFrame:
m1 m2 m3 m4 m5
m1 1.000 0.179 0.775 0.673 0.544
m2 0.299 1.000 0.333 0.521 0.232
m3 0.656 0.440 1.000 0.444 0.722
m4 0.578 0.154 0.623 1.000 0.891
m5 0.345 0.312 0.722 0.221 1.000
I want to get all the clustering operations of the dendrogram. To accomplish that, I created this function:
from sklearn.cluster import AgglomerativeClustering
import numpy as np
import pandas as pd
def clusters(sim, link_name):
clusters_num = len(sim.columns) - 1
clusters_collection = []
while clusters_num >= 1:
clusters = AgglomerativeClustering(n_clusters=clusters_num, affinity='cosine', linkage=link_name).fit_predict(sim)
clusters_collection.append(clusters)
clusters_num = clusters_num - 1
return clusters_collection
sim_matrix = pd.read_excel(r'C:UsersdamiaOneDriveDesktoplogistic management toolEs sim asimmetricasim asimmetrica.xlsx')
sim_matrix.index = sim_matrix.columns
print(sim_matrix)
print(clusters(sim_matrix, 'average'))
The results are the following:
m1 m2 m3 m4 m5
m1 1.000 0.179 0.775 0.673 0.544
m2 0.299 1.000 0.333 0.521 0.232
m3 0.656 0.440 1.000 0.444 0.722
m4 0.578 0.154 0.623 1.000 0.891
m5 0.345 0.312 0.722 0.221 1.000
[array([0, 3, 0, 1, 2], dtype=int64), array([0, 1, 0, 0, 2], dtype=int64), array([0, 1, 0, 0, 0], dtype=int64), array([0, 0, 0, 0, 0], dtype=int64)]
So apparently it groups m1 and m3 as a first move, but I was expecting it to group m4 and m5 because they have the highest similarity value (0.891).
I’ve done this exercise on paper before and the correct grouping order with average linkage should be:
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP