Difference between OrdinalEncoder and LabelEncoder

Question

I was going through the official documentation of scikit-learn learn after going through a book on ML and came across the following thing:

In the Documentation it is given about sklearn.preprocessing.OrdinalEncoder() whereas in the book it was given about sklearn.preprocessing.LabelEncoder(), when I checked their functionality it looked same to me. Can Someone please tell me the difference between the two please?

ipramusinto · Answer

Afaik, both have the same functionality. A bit difference is the idea behind. OrdinalEncoder is for converting features, while LabelEncoder is for converting target variable.

That's why OrdinalEncoder can fit data that has the shape of (n_samples, n_features) while LabelEncoder can only fit data that has the shape of (n_samples,) (though in the past one used LabelEncoder within the loop to handle what has been becoming the job of OrdinalEncoder now)

The Red Pea · Answer

As for differences in OrdinalEncoder and LabelEncoder implementation, the accepted answer mentions the shape of the data: (OrdinalEncoder for 2D data; shape (n_samples, n_features), LabelEncoder is for 1D data: for shape (n_samples,)) That's why a OrdinalEncoder would get an error: ValueError: Expected 2D array, got 1D array instead: ...if trying to fit on 1D data: OrdinalEncoder().fit(['a','b']) However, another difference between the encoders is the name of their learned parameter; LabelEncoder learns classes_ OrdinalEncoder learns categories_ Notice the differences in fitting LabelEncoder vs OrdinalEncoder, and the differences in the values of these learned parameters. LabelEncoder.classes_ is 1D, while OrdinalEncoder.categories_ is 2D. LabelEncoder().fit(['a','b']).classes_ # >>> array(['a', 'b'], dtype='>> [array(['a', 'b'], dtype=object)] Other encoders that work in 2D, including OneHotEncoder, also use the property categories_ More info here about the dtype >> array([[0., 2., 1.]]) LabelEncoder().fit_transform(['cold','warm','hot']) # >>> array([0, 2, 1], dtype=int64) Notice how both encoders assigned integers in alphabetical order 'c'<'h'<'w'. But this part is important: Notice how neither encoder got the "real" order correct (i.e. the real order should reflect the temperature, where order is 'cold'<'warm'<'hot'); based on "real" order, the value 'warm' would have been assigned the integer 1. In the blog post referenced by Piotr, the author does not even use OrdinalEncoder(). To achieve ordinal encoding the author does it manually: maps each temperature to a "real" order integer, using a dictionary like {'cold':0, 'warm':1, 'hot':2}: Refer to this code using Pandas, where first we need to assign the real order of the variable through a dictionary... Though its very straight forward but it requires coding to tell ordinal values and what is the actual mapping from text to integer as per the order. In other words, if you're wondering whether to use OrdinalEncoder, please note OrdinalEncoder may not actually provide "ordinal encoding" the way you expect!

Piotr Rarus - Reinstate Monica · Answer

You use ordinal encoding to preserve order of categorical data i.e. cold, warm, hot; low, medium, high. You use label encoding or one hot for categorical data, where there's no order in data i.e. dog, cat, whale. Check this post on medium. It explains these concepts well.

Answered by Piotr Rarus - Reinstate Monica on October 12, 2020

Difference between OrdinalEncoder and LabelEncoder

3 Answers

Add your own answers!

Ask a Question