Stack Overflow Asked on December 3, 2021
Iam working with files in a folder where i need better way to loop through files and append a column to make master file. For two files i was using reading as two dataframe and appending series. However now i ran into situation with more more than 100 files.
file 1 is as below:
Num Department Product Salesman Location rating1
1 Electronics TV 3 Bigmart, Delhi 5
2 Electronics TV 1 Bigmart, Mumbai 4
3 Electronics TV 2 Bigmart, Bihar 3
4 Electronics TV 2 Bigmart, Chandigarh 5
5 Electronics Camera 2 Bigmart, Jharkhand 5
similary file 2:
Num Department Product Salesman Location rating2
1 Electronics TV 3 Bigmart, Delhi 2
2 Electronics TV 1 Bigmart, Mumbai 4
3 Electronics TV 2 Bigmart, Bihar 4
4 Electronics TV 2 Bigmart, Chandigarh 5
5 Electronics Camera 2 Bigmart, Jharkhand 3
What I am trying to achieve is read Rating column from all the other file and append verticaly. Expected:
Num Department Product Salesman Location rating1 rating2
1 Electronics TV 3 Bigmart, Delhi 5 2
2 Electronics TV 1 Bigmart, Mumbai 4 4
3 Electronics TV 2 Bigmart, Bihar 3 5
4 Electronics TV 2 Bigmart, Chandigarh 5 5
5 Electronics Camera 2 Bigmart, Jharkhand 5 3
I modified some of the code posted here. Following Code worked:
def read_folder(folder):
files = [i for i in os.listdir(folder) if 'xlsx' in i]
df = pd.read_excel(folder+'/{}'.format(files[0]))
for f in files[1:]:
df2 = pd.read_excel(folder+'/{}'.format(f))
df = df.merge(df2.iloc[:,5],left_index=True,right_index=True)
return df
This version of read_folder()
returns a list of data frames. It also add a helper column (for ratings).
import pandas as pd
from pathlib import Path
def read_folder(csv_folder):
''' Input is a folder with csv files; return list of data frames.'''
csv_folder = Path(csv_folder).absolute()
csv_files = [f for f in csv_folder.iterdir() if f.name.endswith('csv')]
# the assign() method adds a helper column
dfs = [
pd.read_csv(csv_file).assign(rating_src = f'rating-{idx}')
for idx, csv_file in enumerate(csv_files, 1)
]
return dfs
Now assemble the data frames into the desired shape:
dfs = read_folder(csv_folder)
dfs = (pd.concat((d for d in dfs))
.set_index(['Num', 'Department', 'Product', 'Salesman', 'Location', 'rating_src'])
.squeeze()
.unstack(level='rating_src')
.reset_index()
)
dfs.columns.name = ''
Answered by jsmart on December 3, 2021
This method read folder and return all in a pandas dataframe
import pandas as pd
import os
def read_folder(csv_folder)
files = os.listdir(csv_folder)
df = []
for f in files:
print(f)
csv_file = csv_folder + "/" + f
df.append(pd.read_csv(csv_file))
df_full = pd.concat(df, ignore_index=True)
return df, full
As I understand your last comment, you need to add rating columns and create one file. After reading all files you can do below operation.
final_df = df[0]
i = 1
for d in df[1:]:
final_df["rating_"+i] = d["rating"]
i = i+1
Answered by ozkulah on December 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP