Data Science Asked by Denis L on August 19, 2020
Suppose we have the following dataframe with multiple values for a certain column:
categories
0 - ["A", "B"]
1 - ["B", "C", "D"]
2 - ["B", "D"]
How can we get a table like this?
"A" "B" "C" "D"
0 - 1 1 0 0
1 - 0 1 1 1
2 - 0 1 0 1
Note: I don’t necessarily need a new dataframe, I’m wondering how to transform such DataFrames to a format more suitable for machine learning.
If [0, 1, 2]
are numerical labels and is not the index, then pandas.DataFrame.pivot_table
works:
In []: data = pd.DataFrame.from_records( [[0, 'A'], [0, 'B'], [1, 'B'], [1, 'C'], [1, 'D'], [2, 'B'], [2, 'D']], columns=['number_label', 'category']) data.pivot_table(index=['number_label'], columns=['category'], aggfunc=[len], fill_value=0)
Out[]: len category A B C D number_label 0 1 1 0 0 1 0 1 1 1 2 0 1 0 1
This blog post was helpful.
If [0, 1, 2]
is the index, then collections.Counter
is useful:
In []: data2 = pd.DataFrame.from_dict( {'categories': {0: ['A', 'B'], 1: ['B', 'C', 'D'], 2:['B', 'D']}}) data3 = data2['categories'].apply(collections.Counter) pd.DataFrame.from_records(data3).fillna(value=0)
Out[]: A B C D 0 1 1 0 0 1 0 1 1 1 2 0 1 0 1
Correct answer by Samuel Harrold on August 19, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP