Data Science Asked by Pujan Paudel on February 14, 2021
I have a list of sentences returned as a result of a document search algorithm. I want to determine if the results returned are semantically close/similar/coherent using some sort of metric. For a starting point, I’m using Word Movers Distance (WMD) and calculating the similarity between the sentences. But my list of sentences is too long, and doing a pairwise comparison for all the items within a list (document) would be computationally infeasible. What might be the best way to solve this?
You could use clustering with a more basic similarity measure, for example cosine or even simply the proportion of words in common (e.g. Jaccard, overlap coefficient). This should gives you groups of sentences which are "quite similar" against each other, whereas sentences in different clusters are supposed to be very different. This way you would only have to compute WMD distance between smaller groups of sentences. By increasing the number of clusters the clusters would be smaller so there would be less WMD computation needed, however there is more risk to miss a pair of sentences since they could end up in different clusters.
Answered by Erwan on February 14, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP