Data Science Asked on January 6, 2021
When building a Pipeline I’m ending up at a scenario that can be simplified like this:
FeatureUnion(NumericalPipeline(steps), CategoricalPipeline(steps))
Since this is one intermediary step in a larger Pipeline, I’m feeding the preceding inputs into both of these and select the corresponding dtypes within the Numerical and Categorical Pipelines.
For some datasets, however, no Categorical Columns are left leading the Pipeline to fail. I’ve tried returning an empty list and ‘None’ but both of these did not result in the Pipeline skipping the “empty” CategoricalPipeline.
After further investigation it turns out that the SimpleImputer() in the CategoricalPipeline causes the error. Depending on the order of steps the following messages are shown:
ValueError: Found array with 0 feature(s) (shape=(150, 0)) while a minimum of 1 is required.
ValueError: at least one array or dtype is required
Any ideas on how to pass the Imputer when no Column is present?
All(?) the sklearn transformers do a check on input data (check_X_y
), which includes a check for an empty dataframe. You could probably monkey-patch out that check, but that seems like overkill.
Instead, ColumnTransformer
seems the way to go. Its main purpose fits your situation. It deals with an empty columns selector gracefully, by just not calling fit on that transformer:
transformers_ : list
The collection of fitted transformers... In case there were no columns selected, this will be the unfitted transformer.
Unless you're removing columns earlier in the pipeline? In that case, please provide that additional context.
Answered by Ben Reiniger on January 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP