Create many CSV Files based on Pandas df column value

Question

I have one csv file with thousands of rows , below is example.
What I need to do is :

Iterate over the pandas df

create separate file for every department with list of staff(groupby or .loc)
To name the file :
If manager is found then name the file with his name
if no manager , then look for officer
if no manager or officer , then write a comment in the last column
the file should be named as this( IT-name2.csv) as name2 staff is manager in the below picture/example.

So I have two variables , dept. name and staffname
I was able to do this but there is lots of manual work , it should not be the case. I name every csv file myself , I have 100s of lines and I add the managername in the csv filename myself , which caused some errors and it could be changed in the future
Now , for every department I have groupby line, and line to save the csv file (manually enter the managername in the filename)
How this can be more automated ?
Many thanks
.

Sid · Answer

I wasn't able to understand the complete requirement, posting an answer which should help:
Get a list of unique departments:
dept_list = list(set(df['Dept.'].tolist()))

Now we want to run through the unique only department list and do some manipulation of the dataframe:
for dept in dept_list:
    sub_df = df.loc[df['Dept.'] == dept]
    # We want to send this to a file. The file name should be dept-officer/manager/other name.csv
    # Check if manager exists in sub_df['Role']
    if 'Manager' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Manager'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))
    elif 'Officer' in df['Role'].tolist():
         name_employee = sub_df[subdf['Role']=='Officer'].iloc[-1]['name']
         sub_df.to_csv('{}-{}.csv' .format(dept, name_employee))

gherka · Answer

It sounds like you're nearly there with the groupby. How about adding a custom function to modify the csv name depending on what you find in the groupby?
import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(data={
    "Dept"    : np.random.choice(["IT", "HR", "Sales"], 20),
    "Staff"   : ["name" + num for num in np.random.randint(0,5,20).astype(str)],
    "Role"    : np.random.choice(a=["Manager", "Officer", "Admin"], size=20, p=[0.1, 0.3, 0.6]),
    "Comment" : [None] * 20
})

def to_csv(group):
    
    roles = group["Role"].tolist()
    dept = group["Dept"].iloc[0]
    staff_name = "NotFound"
    
    if "Manager" in roles:
        staff_name = group["Staff"].iloc[roles.index("Manager")]
    elif "Officer" in roles:
        staff_name = group["Staff"].iloc[roles.index("Officer")]
        
    group.to_csv(f"{dept}-{staf_name}.csv", index=False)

df.groupby("Dept").apply(to_csv)

list().index() will return the position of the first match which you can use to grab the name in that position from the group. It might not be the fastest thing in the world, but hopefully will get the job you have in mind done.

Create many CSV Files based on Pandas df column value

2 Answers

Add your own answers!

Ask a Question