Data Science Asked by martina on September 9, 2020
Suppose I have such a JSON file:
[
{
"id": "0",
"name": "name0",
"first_sent": "date0",
"analytics": [
{
"a": 1,
...
},
{
"a": 2,
...
}
]
}
]
and I want to parse it with Pandas. So I load it with
df = pd.read_json('file.son')
It’s all good until I try to access and count the number of dictionaries in the “analytics” field for each item, for which task I haven’t found any better way than
for i in range(df.shape[0]):
num = len(df[i:i+1]['analytics'][i])
But this looks totally non-elegant and it’s missing the point of using Pandas in the first place. I need to be able to access the fields within “analytics” for each item.
The question is how to use Pandas to access fields within a field (which maps to a Series object), without reverting to non-Pandas approaches.
A head of the DataFrame looks like this (only fields ‘id’ and ‘analytics’ reported):
0 [{u'a': 0.0, u'b...
1 [{u'a': 0.01, u'b...
2 [{u'a': 0.4, u'b...
3 [{u'a': 0.2, u'b...
Name: analytics, dtype: object
0 '0'
1 '1'
2 '2'
3 '3'
The first number is obviously the index, the string is the ‘id’, and it is clear that ‘analytics’ appears as a Series.
Multi-indexing might be helpful. See this.
But the below was the immediate solution that came to mind. I think it's a little more elegant than what you came up with (fewer obscure numbers, more interpretable natural language):
import pandas as pd
df = pd.read_json('test_file.json')
df = df.append(df) # just to give us an extra row to loop through below
df.reset_index(inplace=True) # not really important except to distinguish your rows
for _ , row in df.iterrows():
currNumbDict = len(row['analytics'])
print(currNumbDict)
Correct answer by Russell Richie on September 9, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP