Stack Overflow Asked by Abraham Arreola on January 8, 2021
I have a Panda’s dataframe like the following:
And i want to get the countries where its PIB in 2007 was less than in 2002, but i couldn´t code something to do that only using Pandas built in methods without use python iterations or something like that.
The most i’ve got is the following line:
df[df[df.year == 2007].PIB < df[df.year == 2002].PIB].country
But i get the following error:
ValueError: Can only compare identically-labeled Series objects
Till’ now i’ve only used Pandas to filter data from different columns, but i don’t know how to compare data from the same column, in this case the year.
Any support is welcome.
I suggest create Series
with index by country
column, but is necessary same number of countries in 2007
and 2002
for compare Series with same index values:
df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'],
'PIB': [200, 200, 100, 300],
'year': [2002, 2002, 2007, 2007]})
print (df)
country PIB year
0 Afganistan 200 2002
1 Zimbabwe 200 2002
2 Afganistan 100 2007
3 Zimbabwe 300 2007
df = df.set_index('country')
print (df)
PIB year
country
Afganistan 200 2002
Zimbabwe 200 2002
Afganistan 100 2007
Zimbabwe 300 2007
s1 = df.loc[df.year == 2007, 'PIB']
s2 = df.loc[df.year == 2002, 'PIB']
print (s1)
country
Afganistan 100
Zimbabwe 300
Name: PIB, dtype: int64
print (s2)
country
Afganistan 200
Zimbabwe 200
Name: PIB, dtype: int64
countries = s1.index[s1 < s2]
print (countries)
Index(['Afganistan'], dtype='object', name='country')
Another idea is first pivoting by DataFrame.pivot
and then seelct columns by years and compare with index in boolean indexing
:
df1 = df.pivot('country','year','PIB')
print (df1)
year 2002 2007
country
Afganistan 200 100
Zimbabwe 200 300
countries = df1.index[df1[2007] < df1[2002]]
print (countries)
Index(['Afganistan'], dtype='object', name='country')
Correct answer by jezrael on January 8, 2021
Here is my dataframe :
df = pd.DataFrame([
{"country": "a", "PIB": 2, "year": 2002},
{"country": "b", "PIB": 2, "year": 2002},
{"country": "a", "PIB": 1, "year": 2007},
{"country": "b", "PIB": 3, "year": 2007},
])
If I filter the two year 2002 and 2007, I got.
df_2002 = df[df["year"] == 2007]
out :
country PIB year
0 a 2 2002
1 b 2 2002
df_2007 = df[df["year"] == 2007]
out :
country PIB year
2 a 1 2007
3 b 3 2007
You want to compare the evolution of the PIB for each country.
Pandas is unaware of that, it tries to compare values but here based on the same index. Witch is not what you want, and it's impossible because the indexes are different.
So you just need to use set_index()
df.set_index("country", inplace=True)
df_2002 = df[df["year"] == 2007]
out :
PIB year
country
a 1 2007
b 3 2007
df_2007 = df[df["year"] == 2007]
out :
PIB year
country
a 2 2002
b 2 2002
now you can make the comparison
df_2002.PIB > df_2007.PIB
out:
country
a True
b False
Name: PIB, dtype: bool
# to get the list of countries
(df_2002.PIB > df_2007.PIB)[res == True].index.values.tolist()
out :
['a']
Answered by AlexisG on January 8, 2021
My strategy is using the pivot_table. There is an assumption that there are no two rows having the same ('country','year') pair. With this assumption, aggfunc=np.sum
represents the only single PIB
value.
table = pd.pivot_table(df, values='PIB', index=['country'],
columns=['year'], aggfunc=np.sum)[[2002,2007]]
list(table[table[2002] > table[2007]].index)
The pivot_table looks like this:
Answered by jhihan on January 8, 2021
Try this (considering that you want just the list of these countries):
[i for i in df.country if df[(df.country==i) & (df.year==2007)].PIB.iloc[0] < df[(df.country==i) & (df.year==2002)].PIB.iloc[0]]
Answered by IoaTzimas on January 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP