Software Recommendations Asked by hippietrail on August 11, 2020
There are many algorithms which are based on comparative word frequency used in clustering, keyword analysis, tf-idf, etc.
Usually you need to calculate your own word frequencies from your own corpus. Very large corpora are better but of course this takes a lot of work, space, time, etc, and distracts from the task at hand.
I’m wondering if there are any Web API providers that have done all this for you and provide programmatic access to frequency data via the web.
Requirements:
To return the relative frequency in 1 million words of the word "smartass", query:
https://api.datamuse.com/words?sp=smartass&md=f&max=1
It outputs:
[{"word":"smartass","score":129630,"tags":["f:0.067229"]}]
Extract the result from the json returned, e.g. with python like (the score is NOT the count):
import requests
_wait = 0.5
def get_freq(term):
response = None
while True:
try:
response = requests.get('https://api.datamuse.com/words?sp='+term+'&md=f&max=1').json()
except:
print 'Could not get response. Sleep and retry...'
time.sleep(_wait)
continue
break;
freq = 0.0 if len(response)==0 else float(response[0]['tags'][0][2:])
return freq
You can call this 100,000 times a day. It seems that this is automatically maintained if you run a single process as the response has a delay such that it comes to roughly 100k responses per day.
The counts are from the google n-gram corpus.
Correct answer by Radio Controlled on August 11, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP