TransWikia.com

Gmail to .txt file

Software Recommendations Asked by Ranga Rutiser Sundar on January 9, 2021

Is there a software or script that can download a set selection of emails to a single .txt file? If not, can it be downloaded to different .txt files and then concatenated?

I’m looking to train a ML algorithm off of a few thousand college emails I have, but the manual copy-pasting technique fell apart after a few hundred came in.

Can a Python script, etc. access my email using POP or something and output all emails in a certain folder to a .txt file?

One Answer

You can use OfflineIMAP to download a specified folder from the remote IMAP server. You could for example use .offlineimaprc.gmail that looks something like this:

# allow offlineimap access at https://myaccount.google.com/lesssecureapps

[general]
# List of accounts to be synced, separated by a comma.
accounts = main

[Account main]
# Identifier for the local repository; e.g. the maildir to be synced via IMAP.
localrepository = main-local
# Identifier for the remote repository; i.e. the actual IMAP, usually non-local.
remoterepository = main-remote
# Status cache. Default is plain, which eventually becomes huge and slow.
status_backend = sqlite

[Repository main-remote]
# Remote repos can be IMAP or Gmail, the latter being a preconfigured IMAP.
type = Gmail
remoteuser = [email protected]
remotepass = GMAIL_PASS
sslcacertfile = /etc/ssl/certs/ca-certificates.crt

[Repository main-local]
# Currently, offlineimap only supports maildir and IMAP for local repositories.
type = Maildir
# Where should the mail be placed?
localfolders = ~/.gmail-emails
folderfilter = lambda folder: folder in ['[Gmail].Starred']

Replace values of remoteuser and remotepass fields with your correct username and password respectively and change localfolders to the name of the folder you want to sync. Run offlineimap:

offlineimap -c .offlineimaprc.gmail

Downloaded mail will be put in ~/.gmail-emails/<FOLDER_NAME>. Each e-mail will be put in a separate file. As for the HTML part, I use this Python script to display HTML in Firefox from mutt. It takes HTML formatted e-mail on its standard input like this:

viewhtmlmail.py < 1567104704_1.12841.comp,U=4,FMD5=2af771a9578089448f7740e324bfbe89:2,F

The good thing about viewhtmlmail.py is that it can combine all parts of an HTML formatted e-mail together and show it in its original form with all images embedded.

Answered by Arkadiusz Drabczyk on January 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP