best distance measure for linked data

Data Science Asked by RAbeeq on December 9, 2020

hello let suppose that I have an ndarray for linked data which look like this

+----+----+----+----+----+----+----+
|    | p1 | p2 | p3 | p4 | p5 | p6 |
+----+----+----+----+----+----+----+
| p1 |  0 |  1 |  1 |  0 | 1  | 0  |
+----+----+----+----+----+----+----+
| p2 |  1 |  0 |  0 |  1 | 0  | 0  |
+----+----+----+----+----+----+----+
| p3 |  1 |  0 |  0 |  1 | 0  | 1  |
+----+----+----+----+----+----+----+
| p4 |  0 |  1 |  1 |  0 | 1  | 0  |
+----+----+----+----+----+----+----+
| p5 |  1 |  0 |  0 |  1 | 0  | 0  |
+----+----+----+----+----+----+----+
| p6 |  0 |  0 |  1 |  0 | 0  | 0  |
+----+----+----+----+----+----+----+

and after I process the data I got an array like this

new=np.array([0,1,1,0,1,0],[1,0,0,1,0,0],....[0,0,1,0,0,0])

and I would like to implement cluster to this data of course for larger data and I did read that k-mean algorithm from sikitlearn is not good for this type of data cause it use the euclidean distance so I build my own cluster algorithm but I need to know which is the best distance measure for this type of data (linked data) is it cosine distance or something else ?

distance machine learning

Add your own answers!

Ask a Question

Get help from others!