Data Science Asked by nishanth reddy on December 24, 2020
I am confused on which of the following should be used for standardization:
method 1: fit transforming training data and only transforming test data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform (X_test)
method 2: fit transforming both training and test data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
# scaler_train=sc.fit(X_train)
#X_train_sd=scaler_train.transform(X_train)
X_test = sc.fit_transform (X_test)
#scaler_test=sc.fit(X_test)
#X_test_sd=scaler_train.transform(X_test)
this is a follow up question to:
StandardScaler before and after splitting data
You should only fit your scaler on training data. Your scaler is part of your model and fitting your scaler to some data can be considered as learning from this data.
Test data is used to evaluate your model on previously unseen data, so if you fit your scaler to test data, it is not "unseen" data anymore.
Answered by Adam Oudad on December 24, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP