Stack Overflow Asked by Eithar on November 7, 2021
I have one csv file with thousands of rows , below is example.
What I need to do is :
So I have two variables , dept. name and staffname
I was able to do this but there is lots of manual work , it should not be the case. I name every csv file myself , I have 100s of lines and I add the managername in the csv filename myself , which caused some errors and it could be changed in the future
Now , for every department I have groupby line, and line to save the csv file (manually enter the managername in the filename)
How this can be more automated ?
Many thanks
.
I wasn't able to understand the complete requirement, posting an answer which should help:
Get a list of unique departments:
dept_list = list(set(df['Dept.'].tolist()))
Now we want to run through the unique only department list and do some manipulation of the dataframe:
for dept in dept_list:
sub_df = df.loc[df['Dept.'] == dept]
# We want to send this to a file. The file name should be dept-officer/manager/other name.csv
# Check if manager exists in sub_df['Role']
if 'Manager' in df['Role'].tolist():
name_employee = sub_df[subdf['Role']=='Manager'].iloc[-1]['name']
sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
elif 'Officer' in df['Role'].tolist():
name_employee = sub_df[subdf['Role']=='Officer'].iloc[-1]['name']
sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
Answered by Sid on November 7, 2021
It sounds like you're nearly there with the groupby
. How about adding a custom function to modify the csv name depending on what you find in the groupby?
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(data={
"Dept" : np.random.choice(["IT", "HR", "Sales"], 20),
"Staff" : ["name" + num for num in np.random.randint(0,5,20).astype(str)],
"Role" : np.random.choice(a=["Manager", "Officer", "Admin"], size=20, p=[0.1, 0.3, 0.6]),
"Comment" : [None] * 20
})
def to_csv(group):
roles = group["Role"].tolist()
dept = group["Dept"].iloc[0]
staff_name = "NotFound"
if "Manager" in roles:
staff_name = group["Staff"].iloc[roles.index("Manager")]
elif "Officer" in roles:
staff_name = group["Staff"].iloc[roles.index("Officer")]
group.to_csv(f"{dept}-{staf_name}.csv", index=False)
df.groupby("Dept").apply(to_csv)
list().index()
will return the position of the first match which you can use to grab the name in that position from the group. It might not be the fastest thing in the world, but hopefully will get the job you have in mind done.
Answered by gherka on November 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP