Data Science Asked by yamini goel on February 13, 2021
I have an imbalanced dataset and I wish to predict classes(0 or 1).
Sample x_train
:
id date c1 c2 . . . . . . c20
101 13-02-2015 2 7 . . . . . . 14
101 14-02-2015 24 7 . . . . . . 8
.
.
.
105 13-02-2015 12 5 . . . . . . . 4
.
.
Sample y_train
id class
101 1
105 1
107 0
.
.
.
Now I wish to over sample class 0 in the dataset but the problem is for each id
I have just one row in y_train
whereas I have 50 rows for the same id
in x_train
.
What you have here is called Multi-Instance Learning. From Wikipedia
In machine learning, multiple-instance learning (MIL) is a type of supervised learning. Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances.
Source: https://en.wikipedia.org/wiki/Multiple_instance_learning
The approach you take in this case is different. You need to bring the Multi-Instance Learning problem into a Single-Instance Learning one. One way you can do this is:
Then apply SMOTE on the new dataset (where you have one row for features and label) and any kind of model of Single-Instance Learning.
You can find details in this Review of Multi-Instance Learning and Its applications
Answered by Tasos on February 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP