Data Science Asked by utengr on November 30, 2020

I am wondering if there are any public datasets of Google news with various news categories such as politics, entertainment, lifestyle, general news, sports etc.

I want to use such dataset for topic detection of various sentences or paragraphs. I was planning to train a classifier with such a dataset and use it for predictions. However, I couldn’t find any. Are there any such known datasets available?

3 Answers

Here is a massive dataset of news with categories which I created for exactly such a reason.

Includes all the headlines published by Times of India from 2001-2019 with categories.

Contains ~3 million entries.

Correct answer by Rohit on November 30, 2020

There is another big news dataset in Kaggle called All The News you can dwnload it Here.

The data primarily falls between the years of 2016 and July 2017. And were scraped with beautiful soup from big US news sites like: New York Times, Breitbart, CNN, Business Insider, the Atlantic, Fox News, Talking Points Memo, Buzzfeed News and many more.

Answered by Anoroah on November 30, 2020

This dataset is included with scikit-learn, a popular ML library for Python.

It is postings to Usenet and categorized by the group. The group titles are not exactly "categories" like you would see on Google News, but each newsgroup is supposed to be on a specific topic as indicated by the name, so the concepts are similar. For example:

  • alt.atheism, - Atheism
  •, - Computer Graphics
  • ...
  • - Automobiles
  • - Motorcycles

Answered by CalZ on November 30, 2020

