Is there any text similarity databse available for phrases?

Question

I want to train my application for phrase similarity. I want my model to predict similarity score for phrases as shown in below examples.
ex-

International Business Machines = I.B.M
Synergy Telecom = SynTel
Beam inc = Beam Incorporate
Sir J J Smith = Johnson Smith
Alex, Julia = J Alex
James B. D. Joshi = James Joshi
James Beaty, Jr. = Beaty

Is there any dataset available to train this type of model?

Simon · Answer

This is a difficult problem, but definitely worth exploring.
An interesting resource to look into is DBpedia. It aims to extract structured information from the Wikipedia project. It is available under a free license (CC-BY-SA).
You can conveniently explore the project online, e.g.:

IBM
Beam Suntory

Note that you are restricted to the extensive but ending knowledge on Wikipedia, for example Synergy Telecom/SynTel seems not to have an entry. Your creativity would be required to overcome this limitation.

Erwan · Answer

This seems to correspond to entity linking or possibly named entity coreference. You might find some datasets here.

Is there any text similarity databse available for phrases?

2 Answers

Add your own answers!

Ask a Question