TransWikia.com

Scraping reddit using Python

Code Review Asked by User00257 on December 14, 2020

My objective is to find out on what other subreddit users from r/(subreddit) are posting on; you can see my code below. It works pretty well, but I am curious to know if I could improve it by:

First, restricting my code so that it only considers users only once (i.e. not collect the posting history twice for the same user) and, secondly, by adding a minimum of 5 posts per user before extracting his/her info (i.e. if the user wrote less than 5 posts in his reddit life, my code would not consider him).

import praw
import pandas as data
import datetime as time


reddit = praw.Reddit(client_id = 'XXXX',
                     client_secret = 'XXXX',
                     username = 'XXXX',
                     password = 'XXXX',
                     user_agent = 'XXXX')

collumns = { "User":[], "Subreddit":[], "Title":[], "Description":[], "Timestamp":[]}


for submission in reddit.subreddit("ENTER SUBREDDIT").new(limit=100):
    user = reddit.redditor('{}'.format(submission.author))

    for sub in user.submissions.new(limit=100):
        collumns["User"].append(sub.author)
        collumns["Subreddit"].append(sub.subreddit)
        collumns["Title"].append(sub.title)
        collumns["Description"].append(sub.selftext)
        collumns["Timestamp"].append(sub.created)
          

collumns_data = data.DataFrame(collumns)

def get_date(Timestamp):
    return time.datetime.fromtimestamp(Timestamp)
_timestamp = collumns_data["Timestamp"].apply(get_date)
collumns_data = collumns_data.assign(Timestamp = _timestamp)

collumns_data.to_csv('DataExport.csv')

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP