Data Science Asked by rwamit on January 2, 2021
I have a query regarding Pandas data manipulation.
Let’s say I have a dataframe, df with following structure.
A B C
1 1 7
5 3 3
3 3 2
7 5 2
5 NaN 2
We have 3 columns in the dataframe A, B & C.
B column consists of mean values wrt A.
For example,
Value of B in 3rd row (which is 3) is mean of first 3 rows of A (9/3)
Similarly, value of B in 4th row = (Sum of values in 2nd,3rd and 4th row of A)/3
Now, let’s say I have many NaN values in B and there are no NaN values in A, how do I write a function or code to fill the NaN values as per the logic discussed above?
I tried using loc and iloc but I guess I made some mistake.
Assuming you don't have NaNs in the first two entries of column B, the following code works
index_nan = df.index[df['B'].isna()] #get all indices where B has NaNs
new_df = pd.DataFrame({'B': [np.mean(df['A'][i-2:i+1]) for i in index_nan]}, index=index_nan)
df.update(new_df) #update those values of column B in df
Correct answer by Namita on January 2, 2021
Thank you for the above answer!
That definitely works. However, I found a more efficient way in terms of computation using np.rolling
df['D'] = df['A'].rolling(min_periods=1, window=3).mean()
df['B'] = np.where(df['B'].isnull,df['D'],df['B'])
np.rolling
helps to compute the cumulative sum of previous n values.Answered by rwamit on January 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP