Data Science Asked by Fredrik on February 16, 2021
I try to construct a pipeline in sklearn where I do different (in some cases multiple) transformations on different kind (numeric/ordinal/binary nominal/non-binary non-ordinal nominal) features. An additional tweak is that I want to try out different (and sometimes None) kind of the specific transformations in the pipeline.
So far I have tried the following:
preprocess = make_column_transformer(
(numerical_columns, make_pipeline(RobustScaler(), PolynomialFeatures())),
(categorical_columns, make_pipeline(OneHotEncoder())),
(ordinal_columns, "passthrough"),
(binary_columns, "passthrough"),
)
search_pipeline = Pipeline([("preprocessing", preprocess),
("dimred", PCA()),
("classifier", RandomForestClassifier())])
search_parameters = [
{"preprocessing__pipeline__robustscaler": [None]},
{"preprocessing__pipeline__robustscaler": [RobustScaler()]},
{"preprocessing__pipeline__robustscaler": [StandardScaler()]},
{"preprocessing__pipeline__polynomialfeatures": [None]},
{"preprocessing__pipeline__polynomialfeatures": [PolynomialFeatures(degree=2)], "preprocessing__pipeline__polynomialfeatures__interaction_only": [False, True]},
{"dimred": [None]},
{"dimred": [PCA()], "dimred__n_components": [.95, .75]},
{"dimred": [LinearDiscriminantAnalysis()], "dimred__n_components": [.95, .75]},
{"classifier": [KNeighborsClassifier(weights="distance")],
"classifier__n_neighbors": [3, 7, 11]},
{"classifier": [RandomForestClassifier(n_estimators=100, class_weight="balanced")],
"classifier__max_depth": [5, 10, None]}
]
As you can see, for example I tried to apply different kind of scaler methods for numerical features:
None
RobustScaler
StandardScaler
However after running GridsearchCV:
CV = GridSearchCV(search_pipeline,
search_parameters, cv=5,
scoring="f1_weighted",
refit=True,
n_jobs=-1)
CV.fit(train_X, train_y)
I get error message:
ValueError: Invalid parameter robustscaler for estimator ColumnTransformer(transformers=[('list-1',
['income', 'reside', 'address', 'wireten',
'tollten', 'equipten', 'cardten', 'longten',
'age', 'employ', 'tenure'],
Pipeline(steps=[('robustscaler',
RobustScaler()),
('polynomialfeatures',
PolynomialFeatures())])),
('list-2', ['region', 'custcat'],
Pipeline(steps=[('onehotencoder',
OneHotEncoder())])),
('list-3', ['ed'], 'passthrough'),
('list-4',
['retire', 'callid', 'gender', 'marital',
'tollfree', 'equip', 'callcard', 'wireless',
'multline', 'voice', 'pager', 'internet',
'callwait', 'forward', 'confer', 'ebill'],
'passthrough')]). Check the list of available parameters with `estimator.get_params().keys()`.
I suspect that the syntax in search_parameters
to access specific transformers’ specific parameters is incorrect, but what is correct?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP