TransWikia.com

How to reorder the first n dataframe columns, and add remaining columns at the end?

Stack Overflow Asked by Chris Dixon on January 29, 2021

Unpredictably formatted df:

  First Name  number last_name
0    Cthulhu     666     Smith
    
df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})

This needs to be put into column names and order: TemplateColumns = ['First Name', 'other', 'number']. If columns don’t exist they can be created:

for col in TemplateColumns:
    if col not in df:
        df[col] = np.nan

Which gives:

  First Name  number last_name  other
0    Cthulhu     666     Smith    NaN

And initial columns need to be ordered the same as TemplateColumns, leaving the remaining columns at the end, to get desired_df:

  First Name  other   number last_name
0    Cthulhu    NaN      666     Smith

desired_df = pd.DataFrame({'First Name': ['Cthulhu'], 'other': [np.nan], 'number': [666], 'last_name': ['Smith']})

Reordering columns is well explained in other posts, but I don’t know how to order the first n columns and keep the rest at the end. How can I do this?

3 Answers

Try this

cols = TemplateColumns + df.columns.difference(TemplateColumns, sort=False).tolist()
df_final =  df.reindex(cols, axis=1)

Out[714]:
  First Name  other  number last_name
0    Cthulhu    NaN     666     Smith

Answered by Andy L. on January 29, 2021

You can write your own function to achieve this. Essentially you can use .reindex() to reorder the dataframe while including empty columns if they don't exist. The only remaining part to figure out would be how to add the remaining columns not in TemplateColumns to your dataframe. You can do this by obtaining the set difference of the column index from the TemplateColumns then updating the order before your call to .reindex

Set up data & function

def reordered(df, new_order, include_remaining=True):
    cols_to_end = []
    if include_remaining:
        # gets the items in `df.columns` that are NOT in `new_order` 
        cols_to_end = df.columns.difference(new_order, sort=False)
    
    # Ensures that the new_order items are first
    final_order = new_order + list(cols_to_end)
    return df.reindex(columns=final_order)

df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})
new_order = ['First Name', 'other', 'number']

with include_remaining:

out = reordered(df, new_order, include_remaining=True)

print(out)
  First Name  other  number last_name
0    Cthulhu    NaN     666     Smith

without include_remaining:

out = reordered(df, new_order, include_remaining=False)

print(out)
  First Name  other  number
0    Cthulhu    NaN     666

Answered by Cameron Riddell on January 29, 2021

Use insert like this:

for col in TemplateColumns:
    if col not in df:
        df.insert(1, col, np.nan)

Answered by U11-Forward on January 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP