Data Science Asked by GGS on November 24, 2020
I have the following data set:
I want to use attributes Tags and Authors to classify each record into their respective Rating. In order to do so I want to use a random forest classifier. My concern is how to deal with Tags attribute. Each of the entry has an undetermined number of tags separated by a commas. There are a total of 4412 unique tags and the entry with more tags contains 20 tags. The first entry has tags ["Rhode Island","Economy", "Taxes", "Lincoln Chafee"].
How should I encode this attribute such that I can use Random Forest Classifier from sklearn?
from sklearn.preprocessing import MultiLabelBinarizer
lb = MultiLabelBinarizer()
lb.fit_transform([['A', 'B', 'C'],[ 'A', 'D', 'E', 'B']])
array([[1, 1, 1, 0, 0],
$hspace{1cm}$ [1, 1, 0, 1, 1]])
Correct answer by 10xAI on November 24, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP