Data Science Asked on August 10, 2021
I’m training a segmentation model, Unet++, on 2d images and I am now trying to find the optimal learning rate.
The backbone of the model is Resnet34, I use Adam optimizer and the loss function is the dice loss function.
Also, I use a few callbacksfunctions:
callbacks = [
keras.callbacks.EarlyStopping(monitor='val_loss', patience=15, verbose=1, min_delta=epsilon, mode='min'),
keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1, mode='min', cooldown=0, min_lr=1e-8),
keras.callbacks.ModelCheckpoint(model_save_path, save_weights_only=True, save_best_only=True, mode='min'),
keras.callbacks.ReduceLROnPlateau(),
keras.callbacks.CSVLogger(logger_save_path)
]
I plotted the curves of training loss over epochs for a few learning rates:
The validation loss and training loss seem to decrease slowly. However, the validation loss isn’t oscillating (it is almost always decreasing).
The validation and training losses decreased quickly on first 2/3 epochs. After 6 or 7 epochs, the validation loss increases again.
I have a few questions (I hope it is not too much):
Even a partial response would help me a lot.
I am afraid the that besides learning rate, there are a lot of values for you to make a choice for over a lot of hyperparameters, especially if you’re using ADAM Optimization, etc.
A principled order of importance for tuning is as follows
To tune a set of hyperparameters, you need to define a range that makes sense for each parameter. Given a number of different values you want to try according to your budget, you could choose a hyperparameter value from a random sampling.
Specifically to learning rate investigation though, you may want to try a wide range of values, e.g. from 0.0001 to 1, and so you can avoid sampling random values directly from 0.0001 to 1. You can instead go for $x=[-4,0]$ for $a=10^x$ essentially following a logarithmic scale.
As far as number of epochs go, you should set an early stopping callback with patience~=50
, depending on you "exploration" budget. This means, you give up training with a certain learning rate value if there is no improvement for a defined number of epochs.
Parameter tuning for neural networks is a form of art, one could say. For this reason I suggest you look at basic methodologies for non-manual tuning, such as GridSearch
and RandomSearch
which are implemented in the sklearn package. Additionally, it may be worth looking at more advanced techniques such as bayesian optimisation with Gaussian processes and Tree Parzen Estimators. Good luck!
# Model instance
input_shape = X_train.shape[1]
def create_model(n_hidden=1, n_neurons=30, learning_rate = 0.01, drop_rate = 0.5, act_func = 'ReLU',
act_func_out = 'sigmoid',kernel_init = 'uniform', opt= 'Adadelta'):
model = Sequential()
model.add(Dense(n_neurons, input_shape=(input_shape,), activation=act_func,
kernel_initializer = kernel_init))
model.add(BatchNormalization())
model.add(Dropout(drop_rate))
# Add as many hidden layers as specified in nl
for layer in range(n_hidden):
# Layers have nn neurons model.add(Dense(nn, activation='relu'
model.add(Dense(n_neurons, activation=act_func, kernel_initializer = kernel_init))
model.add(BatchNormalization())
model.add(Dropout(drop_rate))
model.add(Dense(1, activation=act_func_out, kernel_initializer = kernel_init))
opt= Adadelta(lr=learning_rate)
model.compile(loss='binary_crossentropy',optimizer=opt, metrics=[f1_m])
return model
params = dict(n_hidden= randint(4, 32),
epochs=[50], #, 20, 30],
n_neurons= randint(512, 600),
act_func=['relu'],
act_func_out=['sigmoid'],
learning_rate= [0.01, 0.1, 0.3, 0.5],
opt = ['adam','Adadelta', 'Adagrad','Rmsprop'],
kernel_init = ['uniform','normal', 'glorot_uniform'],
batch_size=[256, 512, 1024, 2048],
drop_rate= [np.random.uniform(0.1, 0.4)])
model = KerasClassifier(build_fn=create_model)
random_search = RandomizedSearchCV(model, params, n_iter=5, scoring='average_precision',
cv=5)
random_search_results = random_search.fit(X_train, y_train,
validation_data =(X_test, y_test),
callbacks=[EarlyStopping(patience=50)])
Correct answer by hH1sG0n3 on August 10, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP