Data Science Asked by skrrrt on September 5, 2021
I have standardized training set using
mean = XTrain.mean()
XTrain-=mean
std = XTrain.std()
XTrain/=std
And then used mean
and std
to standardize validation and test sets. The training and validation sets have values that are greater than 1 and less than zero is that okay?
Standardization centers the values around a mean of $0$ with standard deviation $1$. Therefore, having values smaller than $0$ or greater than $1$ is to be expected. If you want to make sure values are between $0$ and $1$ you need to normalize the data instead.
Here is an example of the two procedures taken from the book "Python Machine Learning" by Raschka:
Be aware though to apply the procedure to your test data with parameters obtained from the training data (in case of standardization: mean and std. dev. of train data).
Sklearn has methods for standardization and normalization which you might want to have a look at.
Correct answer by Sammy on September 5, 2021
You're measuring how many standard deviations from the mean a given value is. Certainly values can be many standard deviations from the mean. Even for data with a normal distribution, we expect about $5%$ of the observations to be more than $2$ standard deviations from the mean, and we expect $32%$ of the observations to be more than $1$ standard deviation from the mean.
Therefore, it is not at all concerning that you have values more than $1$.
As far as values less than $0$ go, all that means is that you have a value less than the mean. This is common. (While it can happen, consider how to have a data set where no values are less than the mean.)
As Sammy mentioned mere seconds before I posted, be sure to use the mean and standard deviation from your training data when you transform the test and validation data.
Answered by Dave on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP