Stack Overflow Asked by Chris Dixon on January 29, 2021
Unpredictably formatted df
:
First Name number last_name
0 Cthulhu 666 Smith
df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})
This needs to be put into column names and order: TemplateColumns = ['First Name', 'other', 'number']
. If columns don’t exist they can be created:
for col in TemplateColumns:
if col not in df:
df[col] = np.nan
Which gives:
First Name number last_name other
0 Cthulhu 666 Smith NaN
And initial columns need to be ordered the same as TemplateColumns
, leaving the remaining columns at the end, to get desired_df
:
First Name other number last_name
0 Cthulhu NaN 666 Smith
desired_df = pd.DataFrame({'First Name': ['Cthulhu'], 'other': [np.nan], 'number': [666], 'last_name': ['Smith']})
Reordering columns is well explained in other posts, but I don’t know how to order the first n columns and keep the rest at the end. How can I do this?
Try this
cols = TemplateColumns + df.columns.difference(TemplateColumns, sort=False).tolist()
df_final = df.reindex(cols, axis=1)
Out[714]:
First Name other number last_name
0 Cthulhu NaN 666 Smith
Answered by Andy L. on January 29, 2021
You can write your own function to achieve this. Essentially you can use .reindex()
to reorder the dataframe while including empty columns if they don't exist. The only remaining part to figure out would be how to add the remaining columns not in TemplateColumns
to your dataframe. You can do this by obtaining the set difference of the column index from the TemplateColumns
then updating the order before your call to .reindex
Set up data & function
def reordered(df, new_order, include_remaining=True):
cols_to_end = []
if include_remaining:
# gets the items in `df.columns` that are NOT in `new_order`
cols_to_end = df.columns.difference(new_order, sort=False)
# Ensures that the new_order items are first
final_order = new_order + list(cols_to_end)
return df.reindex(columns=final_order)
df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})
new_order = ['First Name', 'other', 'number']
with include_remaining
:
out = reordered(df, new_order, include_remaining=True)
print(out)
First Name other number last_name
0 Cthulhu NaN 666 Smith
without include_remaining
:
out = reordered(df, new_order, include_remaining=False)
print(out)
First Name other number
0 Cthulhu NaN 666
Answered by Cameron Riddell on January 29, 2021
Use insert
like this:
for col in TemplateColumns:
if col not in df:
df.insert(1, col, np.nan)
Answered by U11-Forward on January 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP