TransWikia.com

How can I count a pandas dataframe over duplications

Stack Overflow Asked by toby chamberlain on January 10, 2021

My initial dataframe is:

    Name        Info1        Info2
0  Name1  Name1-Info1  Name1-Info2
1  Name1  Name1-Info1  Name1-Info2
2  Name1  Name1-Info1  Name1-Info2
3  Name2  Name2-Info1  Name2-Info2
4  Name2  Name2-Info1  Name2-Info2

and i would like to return the number of repetitions of each row as such:

    Name        Info1        Info2  Count
0  Name1  Name1-Info1  Name1-Info2      3
1  Name2  Name2-Info1  Name2-Info2      2

How can I count a pandas dataframe over duplications?

4 Answers

df.groupby(['Name', 'Info1', 'Info2']).size().reset_index().rename(columns={0:"count"})

Correct answer by Tom Ron on January 10, 2021

size = df.groupby('Name').size().tolist()
df = df.groupby('Name').tail(1).reset_index()
df['Count'] = size

Answered by Sam S on January 10, 2021

Given your example df:

    Name        Info1        Info2
0  Name1  Name1-Info1  Name1-Info2
1  Name1  Name1-Info1  Name1-Info2
2  Name1  Name1-Info1  Name1-Info2
3  Name2  Name1-Info2  Name1-Info2
4  Name2  Name1-Info2  Name1-Info2

The following:

df.pivot_table(index=list(df), aggfunc='size')

Will return what you're after:

Name   Info1        Info2
Name1  Name1-Info1  Name1-Info2    3
Name2  Name1-Info2  Name1-Info2    2

Answered by JPI93 on January 10, 2021

Add column 'count' and do df.groupby

df['count'] = 1      
df.groupby(['Name', 'Info1', 'Info2'])['count'].sum().reset_index()

Answered by EddyG on January 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP