August 11, 2020

There are many algorithms which are based on comparative word frequency used in clustering, keyword analysis, tf-idf, etc.

Usually you need to calculate your own word frequencies from your own corpus. Very large corpora are better but of course this takes a lot of work, space, time, etc, and distracts from the task at hand.

I’m wondering if there are any Web API providers that have done all this for you and provide programmatic access to frequency data via the web.


  • English is a must, other languages are a big plus.
  • Gratis is better than paid, open is better than closed.
  • Optional stemming and/or lemmatization would be a plus but not required.
  • Any requirements for registration, throttling, daily limits, etc are OK.
  • Any format is OK but urlencoded and JSON are expected.
  • Unicode support is a very strong preference.
    (Should not blow up on words like café, naïve, etc.)

To return the relative frequency in 1 million words of the word "smartass", query:

It outputs:


Extract the result from the json returned, e.g. with python like (the score is NOT the count):

import requests

_wait = 0.5

def get_freq(term):
    response = None
    while True:
            response = requests.get(''+term+'&md=f&max=1').json()
            print 'Could not get response. Sleep and retry...'
    freq = 0.0 if len(response)==0 else float(response[0]['tags'][0][2:])
    return freq

You can call this 100,000 times a day. It seems that this is automatically maintained if you run a single process as the response has a delay such that it comes to roughly 100k responses per day.

The counts are from the google n-gram corpus.

