"Invalid value" in RMSprop implementation from scratch in Python

Question

Edit 2: The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:
S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"] + reg_term))

Edit 1: I see that S[f"dW{l}"] contains some negative values. How is this possible when np.square(gradients[f"dW{l}"] always contains positive values?
I have implemented a neural network from scratch which uses mini-batch gradient descent. The network works well. Unfortunately, I can't get my RMSprop implementation to work. I have verified that the network works well with momentum.
I get a RuntimeWarning when training the network with RMSprop: "invalid value encountered in sqrt". This happens in the "RMSprop" update step.
My implementation of update parameters:
def update_parameters(parameters, gradients, V, S, batch_size, t, learning_rate, reg_param):
    L = len(parameters) // 2
    beta1 = 0.9
    beta2 = 0.999
    epsilon = 1e-8
    
    for l in range(1, L+1):
        reg_term = (reg_param / batch_size) * parameters[f"W{l}"]

# RMSprop gradients
        S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"]) + reg_term)
        S[f"db{l}"] = beta2 * S[f"db{l}"] + (1 - beta2) * np.square(gradients[f"db{l}"])
        
        # RMSprop update
        parameters[f"W{l}"] -= learning_rate * (gradients[f"dW{l}"] / (np.sqrt(S[f"dW{l}"])) + epsilon)
        parameters[f"b{l}"] -= learning_rate * (gradients[f"db{l}"] / (np.sqrt(S[f"db{l}"])) + epsilon)

This is how I initialize the parameters:
def init_params_V_and_S(activation_layers):
    params = {}
    V = {}
    S = {}
    L = len(activation_layers)
    
    for l in range(1, L):
        params[f"W{l}"] = np.random.randn(activation_layers[l], activation_layers[l-1]) * np.sqrt(2 / activation_layers[l-1])
        params[f"b{l}"] = np.zeros((activation_layers[l], 1))

# RMSprop params
        S[f"dW{l}"] = np.zeros((activation_layers[l], activation_layers[l-1]))
        S[f"db{l}"] = np.zeros((activation_layers[l], 1))

return params, V, S

Any ideas what's causing this?

Simen · Answer

The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:
S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * np.square(gradients[f"dW{l}"] + reg_term)

"Invalid value" in RMSprop implementation from scratch in Python

One Answer

Add your own answers!

Ask a Question