TransWikia.com

How to apply supervised machine learning when the target label depends on multiple input rows?

Data Science Asked by AtanuCSE on February 17, 2021

The problem is a multi-label classification problem.
Data
Now, I know how to train and classify using single row with several attributes. For example, if the dataset looks like the first table from the attached file. Here, each row is associated with a single label. Thus, I can train and test after separating the dataset into training and testing sets.
But the problem occurs when classification label / target label depends on multiple rows such as the second table from the attached file. Consecutive N rows makes one category.
Can you please guide me towards a solution?

  1. Is it possible to fit this problem in any existing tool? For example, WEKA or Neural network using Keras.
  2. Or do I have to change the algorithm in order to fit the problem! Is there any existing solution?
  3. Or do I need to modify the rows in such a way that it transforms into one?

3 Answers

The second table is simply saying rows 1 - 4 are 4 different examples of class A, rows 5 - 8 are 4 separate example of class B and the rest are 4 examples of class C. Just modify the table so the target label column has 12 rows the first for having the value A, the next 4 having the value B and the final 4 having the value C.

Good luck!

Answered by kabla002 on February 17, 2021

What about doing a concatenation of your rows (i.e. Attr1 -> Attr12) , such that you now have 3*4 features (because 4 rows of 3 features) as an input to a multiclass classification model?

For instance, first sample would be described by :

X = [1.1, 1.4, 2.5, 2.3, 2.5, 2.7, 1.1, 1.6, 1.9, 1.5, 1.6, 1.7]
y = "A"

Otherwise, there is no issue in giving 2D or 3D inputs towards a classifier. Take for example convolutional neural networks that do operations on images!

Answered by Arnaud on February 17, 2021

Since you seem to have the same number of rows per sample, perhaps the underlying process is such that it makes sense to treat the data as 2D or unpack into 12 features, as @Arnaud describes. (This seems to depend on the four rows being ordered according to some implicit rule?)

More generally though, this is called "multiple instance learning." Probably start with the wikipedia page, sections Assumptions and Algorithms.

Answered by Ben Reiniger on February 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP