TransWikia.com

Merging two DataFrames using column names of one of the DataFrames

Stack Overflow Asked on December 27, 2021

Task Description

I have a bunch of DataFrames I read in from sheets in an xlsx file. Is it possible to merge df into df2 and create a column in df2 which is the sheet name stored in the loop.

I read the df in as follows:

for idx, sheet_name in enumerate(excel_file.sheet_names):
    df = excel_file.parse(sheet_name)

    # do some stuff

I wish to merge df with df2 within this loop.

If we just do it for an individual sheet, for example, revenue, is it possible to create df3?

df

 Date            AR        AU      GB       US
1983-03-31     0.001   0.053206  0.001  0.160159
1983-04-30     0.001   0.053206  0.001  0.160159
1983-05-31     0.001   0.053206  0.001  0.160159

df2

 Date.        a   ...  z   loc     
1983-03-31                  AR
1983-03-31                  AU
1983-03-31                  GB
1983-03-31                  US
1983-04-30                  AR
1983-04-30                  AU
1983-04-30                  GB
1983-04-30                  US
1983-05-31                  AR
1983-05-31                  AU
1983-05-31                  GB 
1983-05-31                  US

Desired DataFrame
df3

 Date.        a   ...  z   loc   revenue 
1983-03-31                  AR   0.001
1983-03-31                  AU   0.053206
1983-03-31                  GB   0.001
1983-03-31                  US   0.160159
1983-04-30                  AR   0.001
1983-04-30                  AU   0.053206
1983-04-30                  GB   0.001
1983-04-30                  US   0.160159
1983-05-31                  AR   0.001
1983-05-31                  AU   0.053206
1983-05-31                  GB   0.001
1983-05-31                  US   0.160159

2 Answers

Best way out, melt df and merge with df.

 df3= print(df2.merge(pd.melt(df, id_vars=['Date'], 
                  var_name='loc', value_name='Revenue'),how='left', on=['Date','loc']))

print(df3)

     Date loc   Revenue
0   1983-03-31  AR  0.001000
1   1983-03-31  AU  0.053206
2   1983-03-31  GB  0.001000
3   1983-03-31  US  0.160159
4   1983-04-30  AR  0.001000
5   1983-04-30  AU  0.053206
6   1983-04-30  GB  0.001000
7   1983-04-30  US  0.160159
8   1983-05-31  AR  0.001000
9   1983-05-31  AU  0.053206
10  1983-05-31  GB  0.001000
11  1983-05-31  US  0.160159

Answered by wwnde on December 27, 2021

IIUC, DataFrame.lookup

df3 = df2.copy()
df3['revenue'] = df.set_index('Date').lookup(df2['Date'], df2['loc'])
print(df3)

If there are missing value in df for any 'Date' , 'loc' in df2:

df3 = (df.melt('Date', var_name = 'loc', value_name='revenue')
         .merge(df2, on=['Date' , 'loc'], how='right'))


print(df3)

Output

          Date loc   revenue
0   1983-03-31  AR  0.001000
1   1983-03-31  AU  0.053206
2   1983-03-31  GB  0.001000
3   1983-03-31  US  0.160159
4   1983-04-30  AR  0.001000
5   1983-04-30  AU  0.053206
6   1983-04-30  GB  0.001000
7   1983-04-30  US  0.160159
8   1983-05-31  AR  0.001000
9   1983-05-31  AU  0.053206
10  1983-05-31  GB  0.001000
11  1983-05-31  US  0.160159

Answered by ansev on December 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP