Stack Overflow Asked by CHRD on November 24, 2021
Assume I have df
below:
ID V
0 A 1
1 A 2
2 B 4
3 B 3
And the desired output is:
V
0 NaN
1 1.0
2 NaN
3 -1.0
This can be done using groupby
and lambda
with diff
:
df.groupby('ID').apply(lambda x: x.diff())
I am trying to come up with a solution that doesn’t rely on lambda
as this quickly becomes very slow. Any ideas?
UPDATE
Performance comparison between (1) using groupby
, lambda
and diff
, and, (2) only using groupby
and diff
:
1
3.67 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2
2.42 ms ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Well, in this case, groupby objects directly support diff
:
>>> df
ID V
0 A 1
1 A 2
2 B 4
3 B 3
>>> df.groupby('ID').diff()
V
0 NaN
1 1.0
2 NaN
3 -1.0
>>>
But I'm not sure if this will actually improve your performance. Using .apply
on columns, i.e. across the first axis, shouldn't be slower than the above, it is basically equivalent (unlike .apply
ing on the rows).
Answered by juanpa.arrivillaga on November 24, 2021
Use .agg
and pass diff
df.groupby('ID')['V'].agg('diff')
0 NaN
1 1.0
2 NaN
3 -1.0
Answered by wwnde on November 24, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP