Data Science Asked by RAbeeq on December 9, 2020
hello let suppose that I have an ndarray for linked data which look like this
+----+----+----+----+----+----+----+
| | p1 | p2 | p3 | p4 | p5 | p6 |
+----+----+----+----+----+----+----+
| p1 | 0 | 1 | 1 | 0 | 1 | 0 |
+----+----+----+----+----+----+----+
| p2 | 1 | 0 | 0 | 1 | 0 | 0 |
+----+----+----+----+----+----+----+
| p3 | 1 | 0 | 0 | 1 | 0 | 1 |
+----+----+----+----+----+----+----+
| p4 | 0 | 1 | 1 | 0 | 1 | 0 |
+----+----+----+----+----+----+----+
| p5 | 1 | 0 | 0 | 1 | 0 | 0 |
+----+----+----+----+----+----+----+
| p6 | 0 | 0 | 1 | 0 | 0 | 0 |
+----+----+----+----+----+----+----+
and after I process the data I got an array like this
new=np.array([0,1,1,0,1,0],[1,0,0,1,0,0],....[0,0,1,0,0,0])
and I would like to implement cluster to this data of course for larger data and I did read that k-mean algorithm from sikitlearn is not good for this type of data cause it use the euclidean distance so I build my own cluster algorithm but I need to know which is the best distance measure for this type of data (linked data) is it cosine distance or something else ?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP