Feature scaling for MLP neural network sklearn

Question

I am working with a dataset that has multiple scales for my features. Before running sklearn's MLP neural network I was reading around and found a variety of different opinions for feature scaling. Some saying you need to normalize, some saying only standardize, others saying in theory nothing is needed for MLP, some saying only to scale training data and not testing data, the sklearn documentation says MLP is sensitive to feature scaling? This has left me very confused on which route I should take for my dataset before running the MLP model. Any clarification on these and how I should proceed would be much appreciated.

Itamar Mushkin · Answer

In short:

Scaling is indeed desired.
Standardizing and normalizing should both be fine. And reasonable scaling should be good.
Of course you do need to scale your test set, but you do not "train" (i.e. fit) your scaler on the test data - you scale them using a scaler fitted on the train data (it's very natural to do in SKLearn).

For example, if you're normalizing your data (like with an SKLearn StandardScaler object), you .fit it on the train data to get the mean and standard deviance from it, and you .transform both train and test data to subtract the train mean and divide by the standard deviance.

Peter · Answer

Some ML algorithms require standardisation of data and some work better with standardisation. In the case of neural nets (NN), standardisation often improves performance since NN can have a hard time dealing with "very different" scales.
What you can do is to standardise each $x$ column to have mean 0 and standard deviation (sd) 1 (substract mean and divide by sd). You do this based on the training data. In order to make predictions (e.g. on a test set or new data), you also apply the original data transformation to the test or new data. Make sure the same transformation is applied to the new data.
The reason why you use only the training data to get the mean/sd for standardisation is to avoid data leakage. So do not pass any information about the test data into the training process.
After standardisation, the scale of each variable is transformed. E.g. when you had "apples" in the first place, you have "standard deviations" after standardisation. This can sometimes be relevant, e.g. when you want to do statistical inference (instead of prediction).
Standardisation in Python:
# Get mean and SD from train data
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)

# Standardise data
train_data -= mean
train_data /= std
test_data -= mean
test_data /= std

There are also other ways to "rescale" your data, e.g. min-max scaling, which also often works well with NN. The different ways/terms are well described on Wikipedia.
Brief example in R:
The vector apples has one extreme value. After standardisation, the new vector apples_st has a mean of (almost) zero and sd equal to 1. When you look at the new vector apples_st, you see that the extreme value was smoothed out.
apples = c(1,2,3,4,5,100)
apples_st = (apples - mean(apples)) / sd(apples)

mean(apples_st)
[1] -9.251859e-18

sd(apples_st)
[1] 1

apples_st
[1] -0.4584610 -0.4332246 -0.4079882 -0.3827518 -0.3575154  2.0399410

Feature scaling for MLP neural network sklearn

2 Answers

Add your own answers!

Ask a Question