Stack Overflow Asked by aquamad96 on December 30, 2021
I have a dataset that gives the values of some songs, ie something that looks like:
acousticness danceability energy instrumentalness key liveness loudness
0 0.223 0.780 0.72 0.111 1 0.422 0.231
1 0.4 0.644 0.88 0.555 0.5 0.66 0.555
2 0.5 0.223 0.145 0.76 0 0.144 0.567
.
.
.
I want to find the songs/ rows that are numerically closest to another song, such as song 0
, using the euclidean distance.So I’d like to obtain something like:
acousticness danceability energy instrumentalness key liveness loudness Euclidean to song 0
0 0.223 0.780 0.72 0.111 1 0.422 0.231 0
1 0.4 0.644 0.88 0.555 0.5 0.66 0.555 1.334
2 0.5 0.223 0.145 0.76 0 0.144 0.567 1.442
.
.
.
The usual procedure for what you're trying to do, is to use one of sklearn's pairwise metrics, such as the cosine_similarity
, and build a similarity matrix with it:
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
cosine_similarity(df)
array([[1. , 0.86597679, 0.38431913],
[0.86597679, 1. , 0.71838491],
[0.38431913, 0.71838491, 1. ]])
This gives you a square matrix with the indices representing the dataframe song index.
Similarity with a single item
If you're only interested in the similarities with a specific song, say song 0
, you can specify a second a array as, so that the similarities are obtained using all items in the input matrix with a given item.
Since you mentioned the euclidean distance, here's one using sklearn's euclidean_distances
. Note that we have tu subtract the result from 1
, since we have distances. If we want the actual distance, we can just keep the resulting array:
1-euclidean_distances(df, df.to_numpy()[0,None])
array([[ 1. ],
[-0.16977006],
[-1.15823261]])
For the distance, just:
euclidean_distances(df, df.to_numpy()[0,None])
array([[0. ],
[1.43266989],
[2.64328432]])
To update as a new column:
df['Similarity with song 0'] = 1-euclidean_distances(df, df.to_numpy()[0,None]).squeeze()
print(df)
acousticness danceability energy instrumentalness key liveness
0 0.223 0.780 0.720 0.111 1.0 0.422
1 0.400 0.644 0.880 0.555 0.5 0.660
2 0.500 0.223 0.145 0.760 0.0 0.144
loudness Similarity with song 0
0 0.231 1.000000
1 0.555 -0.169770
2 0.567 -1.158233
Answered by yatu on December 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP