Python: Unwanted Truncation of strings in list after pd.DataFrame()

Question

The Data
I have data taken from a webscraper that I am trying to clean. For each webpage scraped, I exported a csv consisting of one row and 10-14 columns.
Input:
"Featured Snippet" | title | misc. content | misc. content | misc. content | website | page title | "Feedback" | "About snippets"
The misc. content cells vary from csv to csv. Sometimes there are two, three, or four. What I am trying to do is combine these middle columns into a single string.
Output:
filename | website | page title | title | content
So, my code imports each csv in a for loop as a pandas dataframe. It extracts the second column for the title, then flips the dataframe to extract the 3rd-to-last column for website, 4th-to-last for Page title, and the whole row up to the 5th column for the content (so the content includes extra data (title and "featured snippet") but thats ok because i can clean it in excel later. It also gret the filename as a value. It puts all these values for each csv into lists, which I combine into a dataframe at the end.
Code
files = sorted(glob.glob('*.csv'))

filenames = []
websites = []
pagetitles = []
titles = []
contents = []

for f in files:

df = pd.read_csv(f,index_col=False)
    df = df[0:1]
    
    title = df.iloc[:,1]
    title = title.to_string(index = False)
    titles.append(title)
    
    df_flipped = df.iloc[:, ::-1]

website = df_flipped.iloc[:,2]
    website = website.to_string(index = False)
    websites.append(website)
    
    pagetitle =  df_flipped.iloc[:,3]  
    pagetitle = pagetitle.to_string(index = False)
    pagetitles.append(pagetitle)
    
    content = df_flipped.iloc[:,4:]
    content = content.dropna(axis = 1)
    
    content = content.apply(lambda row: ' // '.join(row.values.astype(str)), axis=1)        
    contents.append(content)
    
    filename = os.path.splitext(str(f))[0]
    filenames.append(filename)

snippet_data = pd.DataFrame(list(zip(filenames, websites, pagetitles, titles)))
snippet_data.to_csv('datasets/black-friday-snippets.csv')

My Problem
I've actually done everything I wanted to do, but my content keeps getting truncated. I've tried a billion variations of the .join function, tried converting the content into a bunch of different datatypes, and I've already tried about 3904312590781038941 different ways of this:
pd.set_option('display.max_columns', 50000000)
pd.set_option('display.width', 1500000000)

Also, I've done a bunch of similar codes and never had a problem.
Clues

I am using Spyder, and when I open up the content variable, I have to double click on the row to see the full content.

Content is a Series, and contents is a list of Series. Likewise when I open contents variable, I have to double click on the cell to see the full text.

Just to @#$# with my head even more, it shows the truncated version when i try print(content)

It truncates after pd.Dataframe(), but since it also truncates with the print() function, I have no idea exactly why to how to avoid this.

Yes, I tried pd.set_options(blah blah blah). Maybe I'm not using it right.

santma · Answer

Ok, so i figured this one out by putting pd.options.display.max_colwidth = 500 in the for loop right after pd.read_csv()
So it goes:
files = sorted(glob.glob('*.csv'))

filenames = []
websites = []
pagetitles = []
titles = []
contents = []

for f in files:

df = pd.read_csv(f,index_col=False)
    df = df[0:1]

pd.options.display.max_colwidth = 500
    
    title = df.iloc[:,1]
    title = title.to_string(index = False)
    titles.append(title)
    
    df_flipped = df.iloc[:, ::-1]

website = df_flipped.iloc[:,2]
    website = website.to_string(index = False)
    websites.append(website)
    
    pagetitle =  df_flipped.iloc[:,3]  
    pagetitle = pagetitle.to_string(index = False)
    pagetitles.append(pagetitle)
    
    content = df_flipped.iloc[:,4:]
    content = content.dropna(axis = 1)
    
    content = content.apply(lambda row: ' // '.join(row.values.astype(str)), axis=1)        
    contents.append(content)
    
    filename = os.path.splitext(str(f))[0]
    filenames.append(filename)

snippet_data = pd.DataFrame(list(zip(filenames, websites, pagetitles, titles, content)))
snippet_data.to_csv('datasets/black-friday-snippets.csv')

Python: Unwanted Truncation of strings in list after pd.DataFrame()

One Answer

Add your own answers!

Ask a Question