Data Science Asked by Heisenbug on January 6, 2021
I understand what Standard Scalar does and what Normalizer does, per the scikit documentation: Normalizer, Standard Scaler.
I know when Standard Scaler is applied. But in which scenario is Normalizer applied? Are there scenarios where one is preferred over the other?
StandardScaler
: It transforms the data in such a manner that it has mean as 0 and standard deviation as 1. In short, it standardizes the data. Standardization is useful for data which has negative values. It arranges the data in a standard normal distribution. It is more useful in classification than regression.
You can read this blog of mine.
Normalizer
: It squeezes the data between 0 and 1. It performs normalization. Due to the decreased range and magnitude, the gradients in the training process do not explode and you do not get higher values of loss. Is more useful in regression than classification.
You can read this blog of mine.
Answered by Shubham Panchal on January 6, 2021
They are used for two different purposes.
StandardScaler
changes each feature column $f_{:,i}$ to $$f'_{:,i} = frac{f_{:,i} - mean(f_{:,i})}{std(f_{:,i})}.$$
Normalizer
changes each sample $x_n=(f_{n,1},...,f_{n,d})$ to $$x'_n = frac{x_n}{size(x_n)},$$ where $size(x_n)$ for
l1
norm is $left | x_n right |_1=|f_{n,1}|+...+|f_{n,d}|$,l2
norm is $left | x_n right |_2=sqrt{f^{2}_{n,1}+...+f^{2}_{n,d}}$,max
norm is $left | x_n right |_infty=max{|f_{n,1}|,...,|f_{n,d}|}$.To illustrate the contrast, consider data set ${1, 2, 3, 4, 5}$ which consists of 5 one dimensional data points (each data point has one feature),
After applying StandardScaler
, data set becomes ${-1.41, -0.71, 0. ,0.71, 1.41}$.
After applying any type of Normalizer
, data set becomes ${1., 1., 1., 1., 1.}$, since the only feature is divided by itself. So Normalizer
has no use for this case.
Also, when features have different units, e.g. $(height, age, income)$, Normalizer is not used as a pre-processing step; although, it might be used as an ad-hoc feature engineering step similar to what a neuron does in a neural network.
As mentioned in this answer, Normalizer
is mostly useful for controlling the size of a vector in an iterative process, e.g. a parameter vector during training, to avoid numerical instabilities due to large values.
Answered by Esmailian on January 6, 2021
I don't feel like previous answers answered the question at all. So I'll give a quite comprehensive explanation with two concrete use case at the end.
Normalizer
normalizes rows (samplewise), not columns (featurewise). It totally changes the meaning of data because the distributions of resulting feature values are totally changed. Therefore, a scenario where it can be useful is when you consider a feature to be the relation between feature values samplewise rather than featurewise.
For example, take dataset:
weight age
0 45 87
1 40 13
2 56 84
After using Normalizer(norm="l2")
, it becomes:
weight age
0 0.46 0.89
1 0.95 0.31
2 0.55 0.83
As you can see, the distributon of samples at feature level changed on several aspects:
argsort(weight)
gave [1, 0, 2]
. It now gives [0, 2, 1]
. It doesn't change for age
but it's just by chance, on a bigger dataset it would change with very high probability.age[0]
was 6.7 times bigger than age[1]
but it's now 2.9 times bigger.Normalizer
builds totally new features that are not correlated to initial features. Run Python code provided at the end of the notebook to observe the phenomenon.
StandardScaler
and other scalers that work featurewise are preferred in case meaningful information is located in the relation between feature values from one sample to another sample, wherease Normalizer
and other scalers that work stamplewise are preferred in case meaningful information is located in the relation between feature values from one feature to another feature.
For example, several studies showed that weight correlates to lifespan in humans and other mammals (after adjusting for sex, height, geographic origin, etc.). As a result, you can see heavy old people as anomalies. One may be interested in why some heavy people live longer and why some thin people live shorter. Then one may want to look at patterns between weight and age. Maybe it exist different groups where each group has it's own mediator variable from weight to lifespan, etc. As you can see, this consists in a clustering tasks on Normalize
d features.
Another example is when you want to cluster documents by topic. Somewhat, what defines a topic is the frequency of each word relative to another in the document. For example, topic 'statistics' may be caracterized by a relative frequency of word 'variance' over word 'apple' of 12345 (it's random words and frequencies, in real life you would use much more than 2 words). Topic 'verbiage' may be caracterized by a high prominence of linking words and adverbs with regard to nouns and verbs.
Therefore, if your initial features are the frequency of each word in the document (from a predefined dictionary), you can use Normalizer
to get the appropriate relative features that you want. This example is provided by scikit-learn in "Clustering text documents using k-means"
Lastly, in case you ask, Normalizer
scales to unit norm for practical numerical reasons (stability, convergence speed, interpretation, etc.) just like StandardScaler
.
Requirements: seaborn==0.11.0
import numpy.random as rd
import pandas as pd
from sklearn.preprocessing import Normalizer
import seaborn as sns
shape = (100, 2)
df = pd.DataFrame(rd.rand(*shape) * rd.lognormal(1, 0.4, shape), columns=["weight", "age"])
ndf = pd.DataFrame(Normalizer(norm="l2").fit_transform(df), columns=["norm_weight", "norm_age"])
sns.kdeplot(data=pd.concat([df, ndf], axis=1))
for d in [df, ndf]:
sns.pairplot(d.reset_index(), hue="index", diag_kind=None)
On the pairplot of normalized data (third figure), norm_weight
in function of norm_age
makes a circle arc. It's because the $L_2$ norm places data points on the unit circle. Indeed, features are built such that norm_weight ** 2 + norm_age ** 2 == 1
.
Answered by Alexandre Huat on January 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP