TransWikia.com

Is there any text similarity databse available for phrases?

Data Science Asked by Mohit Saini on November 23, 2020

I want to train my application for phrase similarity. I want my model to predict similarity score for phrases as shown in below examples.
ex-

International Business Machines = I.B.M
Synergy Telecom = SynTel
Beam inc = Beam Incorporate
Sir J J Smith = Johnson Smith
Alex, Julia = J Alex
James B. D. Joshi = James Joshi
James Beaty, Jr. = Beaty

Is there any dataset available to train this type of model?

2 Answers

This is a difficult problem, but definitely worth exploring.

An interesting resource to look into is DBpedia. It aims to extract structured information from the Wikipedia project. It is available under a free license (CC-BY-SA).

You can conveniently explore the project online, e.g.:

Note that you are restricted to the extensive but ending knowledge on Wikipedia, for example Synergy Telecom/SynTel seems not to have an entry. Your creativity would be required to overcome this limitation.

Answered by Simon on November 23, 2020

This seems to correspond to entity linking or possibly named entity coreference. You might find some datasets here.

Answered by Erwan on November 23, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP