TransWikia.com

How to use SKLEARN PIPELINE

Data Science Asked by Jacques Nel on June 4, 2021

I have a question on using sklearn pipelines to predict/classify data.
I understand how to build a pipeline, train it with training data, test it with test data; but after that is where I get lost. How do I use the pipeline to predict values on new/unseen data?

I built a pipeline which transforms categorical data with a OneHotEncoder and another that transforms numerical data by scaling the columns with a StandardScaler. I then used FeatureUnion to combine the two pipelines and join a DecisionTreeClassifier() at the end, to classify the data.
I fit the training data sucessfully with pipeline.fit(X_train,y_train) and thereafter predict with pipeline.predict(X_test). This all works fine.

Next I want to use the pipeline to predict classification of new/unseen data; but when I call pipeline.predict(X_unseen) I get a ValueError. The error relates to one of the categorical features of the data, which is the names of a cities. The pipeline does not seem to transform the unseen data.

Reading through the documentation on pipelines and several examples I understand that when .fit() and .predict is called the data is passed through the entire pipeline. If my understanding is correct then the pipeline.predict(X_unseen) should pass the new data through the pipeline, transforming it and then classifying it. However this does not seem to be the case. Can anyone tell me what I’m missing or misunderstanding?
How do I use the pipeline I built to predict on new data?

One Answer

Could you share the code? It is kind of hard to debug with out the code

When you do pipeline.predict it uses the transform method for all the steps.

Might be that in the One Hot Encoding you have not specified how to handle the unknown.

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')

Answered by Carlos Mougan on June 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP