Stack Overflow Asked by Essegn on January 10, 2021
I would like to write mean values from one dataframe (df1) to another (dfmaster ).
Manually i can manage it, but i would like to automate the process in that way, that it will be read all the columns names from the df1 (as variable) and those variable will be used in the code below, to calculate mean of all columns from the dataframe (df1).
From this dataframe should be the mean of columns calculated:
import pandas as pd
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]]
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235'])
df1
This is the master dataframe where the means should be stored:
data = [['Prozess233','NaN', 'NaN','NaN'], ['Prozess234','NaN', 'NaN', 'NaN'], ['Prozess235','NaN', 'NaN', 'NaN']]
dfmaster = pd.DataFrame(data, columns = ['Process', 'Mean', 'St.Dev', 'Max'])
dfmaster
Here is the code to calculate the mean from one column of df1 and store the value in the master dataframe:
index = dfmaster.loc[dfmaster['Process'] == 'Prozess233'].index[0]
keep_col = ['Prozess233']
df1 = df1[keep_col]
df1 = df1[df1['Prozess233'].notna()]
meanPR = df1["Prozess233"].mean()
dfmaster.at[index, 'Mean'] = meanPR
This is the result:
I would like to have a loop, that the code above (to store the mean into the master dataframe) will perform for all the columns of the dataframe df1 – the mean for the "Prozess234" and for the "Prozess235".
I couldn`t figure out, how to store the column names (df1), in order to use them in loop.
I am not sure, if this could be right approach.
Instead of adding to the dfmaster one by one just create it using vectorized methods:
import pandas as pd
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]]
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235'])
dfmaster=pd.concat([df1.mean(), df1.std(), df1.max()], axis=1).reset_index()
dfmaster.columns = ['Process','Mean', 'St.Dev', 'Max']
#dfmaster
Process Mean St.Dev Max
0 Prozess233 6.4 0.200000 6.6
1 Prozess234 13.0 2.645751 15.0
2 Prozess235 9.0 3.605551 13.0
Also, depending on your needs consider checking out df1.describe()
's output:
Prozess233 Prozess234 Prozess235
count 3.0 3.000000 3.000000
mean 6.4 13.000000 9.000000
std 0.2 2.645751 3.605551
min 6.2 10.000000 6.000000
25% 6.3 12.000000 7.000000
50% 6.4 14.000000 8.000000
75% 6.5 14.500000 10.500000
max 6.6 15.000000 13.000000
Answered by noah on January 10, 2021
You can use agg to get specific aggregations for each column:
df1_summary = (df1.agg(["mean", "std", "max"])
.rename(index={"mean": "Mean", "std": "St.Dev", "max": "Max"}))
print(df1_summary)
Prozess233 Prozess234 Prozess235
Mean 6.4 13.000000 9.000000
St.Dev 0.2 2.645751 3.605551
Max 6.6 15.000000 13.000000
Then if you want to fill this into your dfmaster
dfmaster = dfmaster.set_index("Process")
dfmaster.update(df1_summary.T)
print(dfmaster)
Mean St.Dev Max
Process
Prozess233 6.4 0.200000 6.6
Prozess234 13.0 2.645751 15.0
Prozess235 9.0 3.605551 13.0
Answered by Cameron Riddell on January 10, 2021
While you can get the columns for a dataframe with df.columns
, there's almost never a good reason to iterate over a pandas dataframe for simple mathematical calculations.
What you're after can be done with
df1.T.stack().groupby(level=0).agg({np.mean,np.std, max})
mean std max
Prozess233 6.4 0.200000 6.6
Prozess234 13.0 2.645751 15.0
Prozess235 9.0 3.605551 13.0
To break it down further:
transpose the dataframe
dft=df1.T
dft
0 1 2
Prozess233 6.2 6.4 6.6
Prozess234 10.0 15.0 14.0
Prozess235 8.0 13.0 6.0
stack the dataframe
dfs=dft.stack()
dfs
Prozess233 0 6.2
1 6.4
2 6.6
Prozess234 0 10.0
1 15.0
2 14.0
Prozess235 0 8.0
1 13.0
2 6.0
dtype: float64
group and aggregate
dfmaster=dfs.groupby(level=0).agg({np.mean,np.std, max})
dfmaster
mean std max
Prozess233 6.4 0.200000 6.6
Prozess234 13.0 2.645751 15.0
Prozess235 9.0 3.605551 13.0
Answered by G. Anderson on January 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP