Data Science Asked by Taylrl on August 13, 2021
I want to be able to correlate values from various IDs, where the date is the same with one another.
The data looks something like this;
ID Time(secs) Date
AAAA 1 01/01/1990
AAAA 6 02/01/1990
AAAA 5 03/01/1990
AAAA 2 04/01/1990
AAAA 4 05/01/1990
BBBB 2 01/01/1990
BBBB 4 02/01/1990
BBBB 6 03/01/1990
BBBB 3 04/01/1990
CCCC 3 01/01/1990
CCCC 4 02/01/1990
CCCC 1 03/01/1990
CCCC 6 04/01/1990
DDDD 7 01/01/1990
DDDD 4 02/01/1990
DDDD 5 03/01/1990
DDDD 3 04/01/1990
I want to find the correlation coefficient between each combination of these IDs, where there are matching dates. N.B. Not all of the IDs have the same date;
ID CorrCoef
AAAA>BBBB ????
AAAA>CCCC
AAAA>DDDD
BBBB>CCCC
BBBB>DDDD
CCCC>DDDD
I think I need to feed the data from each ID into a variable and then run the following;
data1.corr(data2)
If i understand correctly you can do something like this:
In [42]: df
Out[42]:
ID Time(secs) Date
0 AAAA 1 01/01/1990
1 AAAA 6 02/01/1990
2 AAAA 5 03/01/1990
3 AAAA 2 04/01/1990
4 AAAA 4 05/01/1990
5 BBBB 2 01/01/1990
6 BBBB 4 02/01/1990
7 BBBB 6 03/01/1990
8 BBBB 3 04/01/1990
9 CCCC 3 01/01/1990
10 CCCC 4 02/01/1990
11 CCCC 1 03/01/1990
12 CCCC 6 04/01/1990
13 DDDD 7 01/01/1990
14 DDDD 4 02/01/1990
15 DDDD 5 03/01/1990
16 DDDD 3 04/01/1990
In [43]: df.pivot(index='Date', columns='ID', values='Time(secs)').corr()
Out[43]:
ID AAAA BBBB CCCC DDDD
ID
AAAA 1.000000 0.778924 -0.336336 -0.368964
BBBB 0.778924 1.000000 -0.609449 -0.257143
CCCC -0.336336 -0.609449 1.000000 -0.609449
DDDD -0.368964 -0.257143 -0.609449 1.000000
Virtual helper DF:
In [44]: df.pivot(index='Date', columns='ID', values='Time(secs)')
Out[44]:
ID AAAA BBBB CCCC DDDD
Date
01/01/1990 1.0 2.0 3.0 7.0
02/01/1990 6.0 4.0 4.0 4.0
03/01/1990 5.0 6.0 1.0 5.0
04/01/1990 2.0 3.0 6.0 3.0
05/01/1990 4.0 NaN NaN NaN
Answered by MaxU on August 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP