Data Science Asked by Rutrasss on July 18, 2021
I was looking at the source codes of MinMaxScaler on Github. I know that when you fit a preprocessing class to a dataset, it takes the data and prepares it for transformation.
Let’s say, I fitted MinMaxScaler to X_train
and transformed it. But how does transform work when I use another dataset, let’s say X_test
? When you call transform()
, does it replace the datasets in the use?
Models are trained on training data, and evaluated on test data based on assumption that unseen test data also comes from the same distribution with training data. So when you calculated statistics for training data, based on assumption that test data also comes from the same distribution you should apply same transformations to test data. You should fit MinMaxScaler to your training data, and then use this scaler to transform both tranining data and test data. There are also issues about data leakage, take a look at that: StandardScaler before and after splitting data
For transformations, fit method extract relevant statistics(min, max value for min-max scaling, mean, std for standardization) from the provided data, and transform method transforms each feature individually based on extracted statistics.
Answered by tkarahan on July 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP