Data Science Asked by user277471 on April 17, 2021
Please let me know what to do when there is a value in the testing set is bigger than the max value used to min-max normalize the training set building a histogram classifier.
Do I go back and change the bounds of the min-max normalization for the training set? Wouldn’t that violate the notion that your training set should generalize to any testing set on its own and that you should retroactively change the what was done during the classifier building on the training set based on future testing sets that you are not supposed to know?
Do I change the bounds of the min-max normalization to the the min and max of the testing set? But, you are supposed to use the same transformation on the testing set as the training set, right?
Or, do I just let there be a bin on either side of the normalized histogram that such that everything that gets (not actually) normalized above (below) the interval [0,1] goes into the bin for all values (below) the interval?
Or, do I just exclude values that get transformed outside of the histogram’s interval?
None of these seem right. Please let me know if I am missing an option.
First build the min-max normalization on the entire data set. Then apply different workloads to your train and test split(s) separately. If you believe your data set will change over time you could consider estimating the extreme min and max if you are still interested in this min-max normalization.
Answered by ggordon on April 17, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP