Data Science Asked on January 25, 2021
In mathematics, a vector has both magnitude and direction.
In data science, for identifying document similarity we convert the document into a feature vector. Then apply cosine angle formula between the source and target document’s feature vector.
However the cosine formula is applicable only for vectors. And a vector should have both magnitude snd direction. For a document that is represented as a vector, where is the direction?
From this "Cosine similarity measures the degree to which two vectors point in the same direction, regardless of magnitude.
When vectors point in the same direction, cosine similarity is 1; when vectors are perpendicular, cosine similarity is 0; and when vectors point in opposite directions, cosine similarity is -1. In positive space, cosine similarity is the complement to cosine distance: cosine_similarity = 1 - cosine_distance.
For example, the cosine similarity between [1, 2, 3] and [3, 2, 1] is 0.7143."
Also for angle and "Direction", google results says-
Here is a another nice explanation-
https://www.machinelearningplus.com/nlp/cosine-similarity/
by this article-
"When plotted on a multi-dimensional space, where each dimension corresponds to a word in the document, the cosine similarity captures the orientation (the angle) of the documents and not the magnitude. If you want the magnitude, compute the Euclidean distance instead.
The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance because of the size (like, the word ‘cricket’ appeared 50 times in one document and 10 times in another) they could still have a smaller angle between them. Smaller the angle, higher the similarity."
Answered by BlackCurrant on January 25, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP