Data Science Asked by celsowm on April 30, 2021
I have the following problem: I have two differents sets of labels (extracted using N.E.R) and given a combination of labels of the first set (a,b,c or d) I have a supervised set of best combination of the second (x,y,z) as an "answer".
The problem is, both can vary in size.
A hypothetical training data would be something like:
{a1,b2,c4,d1} -> {x2,y4,z5}
{a1,b1,c1} -> {x2,y2,z1}
{a4,b2,c4,d1} -> {x1,y3,z5}
...
{a4,b2,c4,d3} -> {x1,y3,z5,w2}
Of course new types of combination of the first set would appear and, using ML, I’d expect to like to give the best prediction.
So, what would be the best machine learning approach for that situation?
This could be modeled as multi-label classification. The features are nominal values, and the targets are the presence or absence of nominal values.
There are wide variety of algorithms that can learn multi-label classification. The "best" one is empirical question that depends on the specific dataset. One popular option is random forest classifier.
There is also a "strict" version of problem where each combination is considered a unique label. Then it would be multi-class classification. But the targets might be too sparse to learn.
Answered by Brian Spiering on April 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP