Data Science Asked by Crazy9 on January 21, 2021
I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y
Let’s say One sample the followings Data is only one value: X[0], X[1], X[2], X[4], Y
This is normal machine training problem.
But now, if I would like to set X[3] multiple values for example sample 1 Data is:
X[0] | X[1] | X[2] | X[3] | X[4] | Y
10 | 5 | 6 | [10, 20, 30, 40, 50] | 7 | 90
Data in sample 2 is:
X[0] | X[1] | X[2] | X[3] | X[4] | Y
11 | 7 | 5 | [20, 30, 40, 50, 60] | 3 | 80
Is this possible to follow the normal machine training process and got a model which could calculate a sample with other example with Data like:
X[0] | X[1] | X[2] | X[3] | X[4] | Y
10.5 | 6 | 5.5 | [15, 25, 35, 45, 55] | 5 | ???
If the length for each X[3] is not long, it is possible to divide the X[3] into multiple new features, but if the length of X[3] is very long (len > 1000) with different distribution, making binary is also lead to too many new features. Is there any way to treat the X[3] directly without adding new features?
Really appreciate for your help.
There are many options.
The most common method to handle variable length vectors is to pad the vectors. Add zeros as placeholders to the shorter length vectors until all the vectors are the same length. Then it can be modeled similar to any other feature vector.
Another option is to take the norm of the vector. That would yield a single scalar that could be used in the machine learning model.
Ultimately, the way to handle a feature vector that could be variable lengths depends on what the feature represents.
Answered by Brian Spiering on January 21, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP