Data Science Asked by Simen on September 5, 2021
Edit 2: The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:
S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"] + reg_term))
Edit 1: I see that S[f"dW{l}"] contains some negative values. How is this possible when np.square(gradients[f"dW{l}"] always contains positive values?
I have implemented a neural network from scratch which uses mini-batch gradient descent. The network works well. Unfortunately, I can’t get my RMSprop implementation to work. I have verified that the network works well with momentum.
I get a RuntimeWarning when training the network with RMSprop: "invalid value encountered in sqrt". This happens in the "RMSprop" update step.
My implementation of update parameters:
def update_parameters(parameters, gradients, V, S, batch_size, t, learning_rate, reg_param):
L = len(parameters) // 2
beta1 = 0.9
beta2 = 0.999
epsilon = 1e-8
for l in range(1, L+1):
reg_term = (reg_param / batch_size) * parameters[f"W{l}"]
# RMSprop gradients
S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"]) + reg_term)
S[f"db{l}"] = beta2 * S[f"db{l}"] + (1 - beta2) * np.square(gradients[f"db{l}"])
# RMSprop update
parameters[f"W{l}"] -= learning_rate * (gradients[f"dW{l}"] / (np.sqrt(S[f"dW{l}"])) + epsilon)
parameters[f"b{l}"] -= learning_rate * (gradients[f"db{l}"] / (np.sqrt(S[f"db{l}"])) + epsilon)
This is how I initialize the parameters:
def init_params_V_and_S(activation_layers):
params = {}
V = {}
S = {}
L = len(activation_layers)
for l in range(1, L):
params[f"W{l}"] = np.random.randn(activation_layers[l], activation_layers[l-1]) * np.sqrt(2 / activation_layers[l-1])
params[f"b{l}"] = np.zeros((activation_layers[l], 1))
# RMSprop params
S[f"dW{l}"] = np.zeros((activation_layers[l], activation_layers[l-1]))
S[f"db{l}"] = np.zeros((activation_layers[l], 1))
return params, V, S
Any ideas what’s causing this?
The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:
S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * np.square(gradients[f"dW{l}"] + reg_term)
Answered by Simen on September 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP