Data Science Asked on December 19, 2021
I’m using the GridSearchCV ()
class from scikit to perform hyperparameter optimization in a sequential neural network. I’ve built a pipeline to also find the best number of features by putting a feature selector inside the pipeline. The problem is how to define the input_shape
, since this depends on the k
parameter from the feature selector. Is it possible to set the value of classifier__input_shape
to be the same value (at all times) of feature_selector__feature__selector_k
?
I’ve provided the correspondent piece of code below.
def create_model (learn_rate = 0.01, dropout_rate = 0.0, weight_constraint = 0, input_shape):
model = Sequential ()
model.add (Dense (units = 64, activation = 'relu',
input_shape = (input_shape, )))
model.add (Dropout (dropout_rate))
model.add (Dense (32, activation = 'relu'))
model.add (Dense (1, activation = 'sigmoid'))
model.compile (loss = 'binary_crossentropy',
optimizer = Adam (lr = learn_rate),
metrics = ['accuracy'])#, metrics.CategoricalAccuracy ()])
return model
standard_scaler_features = remaining_features
my_scaler = StandardScaler ()
steps = list ()
steps.append (('scaler', my_scaler))
standard_scaler_transformer = Pipeline (steps)
my_feature_selector = SelectKBest ()
steps = list ()
steps.append (('feature_selector', my_feature_selector))
feature_selector_transformer = Pipeline (steps)
clf = KerasClassifier (build_fn = create_model, verbose = 2)
clf = Pipeline (steps = [('scaler', my_scaler),
('feature_selector', feature_selector_transformer),
('classifier', clf)],
verbose = True)
param_grid = {'feature_selector__feature_selector__score_func' : [f_classif],
'feature_selector__feature_selector__k' : [7, 9, 15],
'classifier__input_shape' : [7, 9, 15],
'classifier__epochs' : [2, 3, 4]}
cv = RepeatedStratifiedKFold (n_splits = 5, n_repeats = 1, random_state = STATE)
grid = GridSearchCV (estimator = clf, param_grid = param_grid, scoring = 'f1',
verbose = 1, n_jobs = 1, cv = cv)
grid_result = grid.fit (X_train_df, y_train_df)
And the error:
ValueError: Input 0 of layer sequential_9 is incompatible with the layer: expected axis -1 of input shape to have value 9 but received input with shape [None, 7]
I see two solutions:
param_grid
avoiding irrelevant combinationsfeature_selector__feature__selector_k
and classifier__input_shape
First solution: you can generate the right list of combinations using something close to this:
param_grid = [
{
'feature_selector__feature_selector__score_func' : [f_classif],
'feature_selector__feature_selector__k' : [k],
'classifier__input_shape' : [k],
'classifier__dropout_rate' : [0.0, 0.5]
}
for k in [7, 9, 15]
]
Second solution, you can use a specific class that create your model when fitting based on the shape of X. Here is a code sample:
class MyKerasClf():
def predict(self, X):
y_pred_nn = self.clf.predict(X)
return np.array(y_pred_nn).flatten()
def create_model(self, learn_rate = 0.01, weight_constraint = 0 ):
model = Sequential ()
model.add (Dense (units = 64, activation = 'relu',
input_shape = (self.input_shape, )))
model.add (Dropout (self.dropout_rate))
model.add (Dense (32, activation = 'relu'))
model.add (Dense (1, activation = 'sigmoid'))
model.compile (loss = 'binary_crossentropy',
optimizer = Adam (lr = learn_rate),
metrics = ['accuracy'])
return model
def fit(self, X, y, **kwargs):
self.input_shape = X.shape[1]
self.clf = KerasClassifier(build_fn = self.create_model, verbose = 2)
self.clf.fit(X, y, **kwargs)
def set_params(self, **params):
if 'dropout_rate' in params:
self.dropout_rate = params['dropout_rate']
else:
self.dropout_rate = 0.0
Then you can use the class in your pipeline
X, y = make_classification(n_features=50, n_redundant=0, n_informative=2,
random_state=42, n_clusters_per_class=1)
my_scaler = StandardScaler ()
steps = list ()
steps.append (('scaler', my_scaler))
standard_scaler_transformer = Pipeline (steps)
my_feature_selector = SelectKBest ()
steps = list ()
steps.append (('feature_selector', my_feature_selector))
feature_selector_transformer = Pipeline (steps)
# Create a specific clf
my_clf = MyKerasClf( )
pip_clf = Pipeline (steps = [('scaler', my_scaler),
('feature_selector', feature_selector_transformer),
('classifier', my_clf)],
verbose = True)
param_grid = {'feature_selector__feature_selector__score_func' : [f_classif],
'feature_selector__feature_selector__k' : [7, 15],
'classifier__dropout_rate' : [0.0, 0.5]
}
cv = RepeatedStratifiedKFold (n_splits = 5, n_repeats = 1, random_state = 42)
grid = GridSearchCV (estimator = pip_clf, param_grid = param_grid, scoring = 'f1',
verbose = 1, n_jobs = 1, cv = cv)
grid_result = grid.fit(X, y)
Note: I although added the dropout to be tested in the gridsearch as an example.
Answered by etiennedm on December 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP