Stack Overflow Asked by Muhtadi on December 22, 2020
I have a nested list like this:
data = [[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['tiktok', 'tenaga kesehatan'], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['kanker'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['jantung'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['jantung'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19', 'covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'September'],
[['covid-19', 'covid-19'], 'September'],
[['jantung'], 'September'],
[['jantung'], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'August'],
[[], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['jantung'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19', 'covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'July']]
And i want to count the all the token (‘covid-19’,’jantung’… etc) by the name of month so i can get the token frequency by month.
Heres my expected output:
result = [
['covid-19',0,0,0,0,0,0,1,19,17,21,0,0],
['tiktok',0,0,0,0,0,0,0,0,0,1,0,0],
['jantung',0,0,0,0,0,0,0,1,2,2,0,0],
['kanker',0,0,0,0,0,0,0,0,0,1,0,0],
['tenaga kesehatan',0,0,0,0,0,0,0,0,0,1,0,0],
]
Note that : '0,0,0,0,0,0,1,19,17,21,0,0'
is the order from January to December and the sum of the token from that month.please suggest me a way to convert that nested into the result list.
Any ideas?
Here we go with a possible solution:
import calendar
data = [[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['tiktok', 'tenaga kesehatan'], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['kanker'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['jantung'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[[], 'October'],
[['covid-19'], 'October'],
[[], 'October'],
[['jantung'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19'], 'October'],
[['covid-19', 'covid-19'], 'October'],
[['covid-19'], 'October'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'September'],
[['covid-19', 'covid-19'], 'September'],
[['jantung'], 'September'],
[['jantung'], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'September'],
[['covid-19'], 'September'],
[[], 'September'],
[[], 'August'],
[[], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['jantung'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'August'],
[[], 'August'],
[['covid-19'], 'August'],
[['covid-19', 'covid-19'], 'August'],
[['covid-19'], 'August'],
[['covid-19'], 'July']]
final = []
for el in data:
if len(el[0]) > 0:
for key in el[0]:
if key not in [sub[0] for sub in final]:
final.append([key] + [0]*12)
for sub in final:
if sub[0] == key:
sub[list(calendar.month_abbr).index(el[-1][:3])] += 1
print(final)
The output will be:
[['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0], ['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], ['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0]]
NOTE: As someone mentioned, however, it might be a good idea to use a different data structure to store the result. Surely a dictionary would be more convenient and would allow you to write a more linear solution.
Correct answer by lorenzozane on December 22, 2020
Solve the same problem with functional programming, perhaps.
from functional import seq
NONE_KEY = 'NONE'
MONTHS = {
'January': 1,
'Feburary': 2,
'March': 3,
'April': 4,
'May': 5,
'June': 6,
'July': 7,
'August': 8,
'September': 9,
'October': 10,
'November': 11,
'December': 12
}
def reGroupByFirstItem(d):
if (len(d[0]) > 0):
return seq(d[0]).map(lambda key: (key, d[1])).to_list()
else:
return [(NONE_KEY, d[1])]
def hasKey(l, key):
return seq(l).filter(lambda x: x[0] == key).len() > 0
def getIndexByKey(ll, key):
for i in range(len(ll)):
if ll[i][0] == key:
return i
def initList(key):
l = [0 for x in range(12)]
l.insert(0, key)
return l
def updateList(l, month):
l[ MONTHS[month] ] += 1
return l
def updateByKey(ll, key, val):
i = getIndexByKey(ll, key)
ll[i] = updateList(ll[i], val)
return ll
def initListWithValue(key, val):
l = initList(key)
return updateList(l, val)
def createNewList(nextItem, current):
key = nextItem[0]
val = nextItem[1]
if hasKey(current, key):
current = updateByKey(current, key, val)
else:
current.append(initListWithValue(key, val))
return current
result = seq(data)
.map(reGroupByFirstItem)
.flatten()
.fold_right([], createNewList)
print(result)
You do need to install the pyFunctional
first:
pip install pyfunctional
Full documentation here https://docs.pyfunctional.pedro.ai/en/latest/index.html
Answered by Joel Chu on December 22, 2020
While others have written really good answers, I feel solving this via pandas
is both more maintainable and more verbose. Plus pandas objects are really easy to work with.
First the imports:
import pandas as pd
import calendar
from pprint import pprint
Here's the main body of code:
df = pd.DataFrame(data, columns=["lists", "month"])
names = list(set([y for x in df["lists"] for y in x]))
df[names] = 0
def func(row):
for n in names:
for k in row["lists"]:
if k == n:
row[n] += 1
return row
df = df.apply(func, axis=1)
df.drop(["lists"], inplace=True, axis=1)
new_df = df.groupby(by="month").sum().T.reset_index()
new_df.columns.name = None # Just for my taste to remove the "month" label of groupby result
months = list(calendar.month_name)[1:] # list of months. There's an empty string at index 0.
new_df[[m for m in months if m not in new_df.columns]] = 0 #Creating columns for unseen months
new_df = new_df[["index"] + months] #sorting the months
print(new_df)
pprint(new_df.values.tolist())
The output will be:
index January February ... October November December
0 kanker 0 0 ... 1 0 0
1 covid-19 0 0 ... 19 0 0
2 jantung 0 0 ... 2 0 0
3 tiktok 0 0 ... 1 0 0
4 tenaga kesehatan 0 0 ... 1 0 0
[5 rows x 13 columns]
[['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]
The outputs will be:
index January February ... October November December
0 tenaga kesehatan 0 0 ... 1 0 0
1 covid-19 0 0 ... 19 0 0
2 kanker 0 0 ... 1 0 0
3 jantung 0 0 ... 2 0 0
4 tiktok 0 0 ... 1 0 0
[5 rows x 13 columns]
[['tenaga kesehatan', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
['covid-19', 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0, 0],
['kanker', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
['jantung', 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0],
['tiktok', 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]]
Answered by Farhood ET on December 22, 2020
you really shouldn't be storing different data in a list like that, how about something that looks like this?
{'covid-19': [0, 0, 0, 0, 0, 0, 0, 1, 17, 15, 19, 0],
'jantung': [0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0],
'kanker': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
'tenaga kesehatan': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
'tiktok': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]}
and here's a code snippet to make this dict:
from collections import defaultdict
result = defaultdict(lambda: [0]*12)
for i in data:
if i[0]:
for j in i[0]:
result[j][datetime.datetime.strptime(i[1],"%B").month - 1] += 1
Answered by AntiMatterDynamite on December 22, 2020
I suggest you to change nested list to be dictionary like this
{
"October":{
"covid-19":8,
"jantung":5
},
"November":{...},
...
}
or like this
{
"covid-19":{
"Oktober":8,
"November":5
},
"Jantung":{...},
...
}
Answered by Aqil Fiqran on December 22, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP