Data Science Asked on July 20, 2021
I want to compute a similarity comparison for each entry in a dataset to every other entry that is labeled as class 1 (excluding the current entry if it has a label of 1). So, consider a matrix of training data that has columns for ID and class/label, and then a bunch of data columns.
ID Label var1 var2 var3 ... varN
1 1 0.26 0.44 0.2 0.11
2 0 0.13 0.34 0.14 0.21
3 1 0.22 0.34 0.45 0.57
4 1 0.45 0.13 0.67 0.78
5 0 0.32 0.76 0.11 0.67
.
.
.
There are several thousand rows with entries like this. I want to compute the similarity between each row and every other row where Label==1. So for ID==1, I would like to compute the similarity for ID==3 and ID==4; for ID==2, I would like to compute similarity for ID==1, ID==3, and ID==4; and so on for every single row.
Another way to think about this is: I have a matrix A
and I’m an forming matrix B
which is a subset of A
(i.e., entries of A
where Label==1). I want to compute similarity between A
and B
, but the output matrix should exclude similarities where the entries are the same (as indicated by ID).
Right now, I have this implemented as a for loop in R, which is unbearably slow (it takes around 10 minutes to execute for about 3000 rows).
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP