Data Science Asked on April 10, 2021
I am trying to do ordinal encoding using:
from sklearn.preprocessing import OrdinalEncoder
I will try to explain my problem with a simple dataset.
X = pd.DataFrame({'animals':['low','med','low','high','low','high']})
enc = OrdinalEncoder()
enc.fit_transform(X.loc[:,['animals']])
array([[1.],
[2.],
[1.],
[0.],
[1.],
[0.]])
It is labelling alphabetically, but if I try:
enc = OrdinalEncoder(categories=['low','med','high'])
enc.fit_transform(X.loc[:,['animals']])
Shape mismatch: if n_values is an array, it has to be of shape (n_features,).
Which I do not understand. I would like to be able to decide how the labelling is done.
I considered doing this:
level_mapping={'low':0,'med':1,'high':2}
X['animals']=data['animals'].replace(level_mapping)
However, I have large number of features in my dataset which have similar categories.
Thanks.
I'm not sure if you ever figured this out but I was trying to find answers on this exact same question and there aren't really any good answers in my opinion. I finally figured it out though. OrdinalEncoder is capable of encoding multiple columns in a dataframe. So, when you instantiate OrdinalEncoder(), you give the categories parameter a list of lists:
enc = OrdinalEncoder(categories=[list_of_values_cat1, list_of_values_cat2, etc])
Specifically, in your example above, you would just put ['low', 'med', 'high'] inside another list:
end = OrdinalEncoder(categories=[['low', 'med', 'high']])
enc.fit_transform(X.loc[:,['animals']])
>>array([[0.],
[1.],
[0.],
[2.],
[0.],
[2.]])
# Now 'low' is correctly mapped to 0, 'med' to 1, and 'high' to 2
To see how you can encode multiple columns with their own individual ordinal values, try this:
# Sample dataframe with 2 ordinal categorical columns: 'temp' and 'place'
categorical_df = pd.DataFrame({'my_id': ['101', '102', '103', '104'],
'temp': ['hot', 'warm', 'cool', 'cold'],
'place': ['third', 'second', 'first', 'second']})
# In the 'temp' column, I want 'cold' to be 0, 'cool' to be 1, 'warm' to be 2, and 'hot' to be 3
# In the 'place' column, I want 'first' to be 0, 'second' to be 1, and 'third' to be 2
temp_categories = ['cold', 'cool', 'warm', 'hot']
place_categories = ['first', 'second', 'third']
# Now, when you instantiate the encoder, both of these lists go in one big categories list:
encoder = OrdinalEncoder(categories=[temp_categories, place_categories])
encoder.fit_transform(categorical_df[['temp', 'place']])
>>array([[3., 2.],
[2., 1.],
[1., 0.],
[0., 1.]])
Answered by fugumagu on April 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP