Stack Overflow Asked by Sophia Deng on January 3, 2022
For example, I have a data sheet like here Datasample, I have defined 4 headers, I understand I can access one of the last header by using df[‘2020-01-01’, ‘A’, C1′, ‘D1’]. What if I want to read both ‘D1’ and ‘D2’? Or ‘D1’ and ‘D5’? What would be the right code, thank you!
First, we create a DataFrame (with a MultiIndex) like the one in your example:
import numpy as np
import pandas as pd
data = np.arange(11 * 8).reshape((11, 8))
midx = pd.MultiIndex.from_tuples(
[
('2020-01-01', 'A', 'C1', 'D1'),
('2020-01-01', 'A', 'C1', 'D2'),
('2020-01-01', 'A', 'C2', 'D3'),
('2020-01-01', 'A', 'C2', 'D4'),
('2020-01-01', 'B', 'C3', 'D5'),
('2020-01-01', 'B', 'C3', 'D6'),
('2020-01-01', 'B', 'C4', 'D7'),
('2020-01-01', 'B', 'C4', 'D8'),
],
names=('y', 'a', 'b', 'c',)
)
df = pd.DataFrame(data=data, columns=midx)
df
Now, we use the somewhat cumbersome 'index slice' to extract the columns of interest. Here are a few examples:
# get the 'A' columns
df.loc[:, (slice(None), 'A')]
# get the 'C1' column
df.loc[:, (slice(None), slice(None), 'C1')]
# get the D1 and D5 columns
df.loc[:, (slice(None), slice(None), slice(None), ['D1', 'D5'])]
y 2020-01-01
a A B
b C1 C3
c D1 D5
0 0 4
1 8 12
2 16 20
3 24 28
4 32 36
5 40 44
6 48 52
7 56 60
8 64 68
9 72 76
10 80 84
Here is the MultiIndex after performing the last operation:
df.loc[:, (slice(None), slice(None), slice(None), ['D1', 'D5'])].columns
MultiIndex([('2020-01-01', 'A', 'C1', 'D1'),
('2020-01-01', 'B', 'C3', 'D5')],
names=['y', 'a', 'b', 'c'])
Docs are here: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html
Answered by jsmart on January 3, 2022
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP