Stack Overflow Asked by Ribit8950 on December 4, 2021
Excuse the title, I’m not even sure how to label what I’m trying to do. I have data in a DataFrame that looks like this:
Name Month Status
---- ----- ------
Bob Jan Good
Bob Feb Good
Bob Mar Bad
Martha Feb Bad
John Jan Good
John Mar Bad
Not every name ‘Name’ has every ‘Month’ and ‘Status’. What I want to get is:
Name Month Status
---- ----- ------
Bob Jan Good
Bob Feb Good
Bob Mar Bad
Martha Jan N/A
Martha Feb Bad
Martha Mar N/A
John Jan Good
John Feb N/A
John Mar Bad
Where the missing months are filled in with a value in the ‘Status’ column.
What I’ve tried to do so far is export all of the unique ‘Month" values to a list, convert to a DataFrame, then join/merge the two DataFrames. But I can’t get anything to work.
What is the best way to do this?
Do pivot
df=df.pivot(*df).stack(dropna=False).to_frame('Status').reset_index()
Name Month Status
0 Bob Feb Good
1 Bob Jan Good
2 Bob Mar Bad
3 John Feb NaN
4 John Jan Good
5 John Mar Bad
6 Martha Feb Bad
7 Martha Jan NaN
8 Martha Mar NaN
Answered by BENY on December 4, 2021
You can treat the month as a categorical column, then allow GroupBy
to do the heavy lifting:
df['Month'] = pd.Categorical(df['Month'])
df.groupby(['Name', 'Month'], as_index=False).first()
Name Month Status
0 Bob Feb Good
1 Bob Jan Good
2 Bob Mar Bad
3 John Feb NaN
4 John Jan Good
5 John Mar Bad
6 Martha Feb Bad
7 Martha Jan NaN
8 Martha Mar NaN
The secret sauce here is that pandas treats missing "categories" by inserting a NaN there.
Caveat: This always sorts your data.
Answered by cs95 on December 4, 2021
You have to take advantage of Pandas' indexing to reshape the data :
Step1 : create a new index from the unique values of Name
and Month
columns :
new_index = pd.MultiIndex.from_product(
(df.Name.unique(), df.Month.unique()), names=["Name", "Month"]
)
Step2 : set Name
and Month
as the new index, reindex with new_index
and reset_index to get your final output :
df.set_index(["Name", "Month"]).reindex(new_index).reset_index()
UPDATE 2021/01/08:
You can use the complete function from pyjanitor; at the moment you have to install the latest development version from github:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import pyjanitor
df.complete("Name", "Month")
Answered by sammywemmy on December 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP