TransWikia.com

"Invalid value" in RMSprop implementation from scratch in Python

Data Science Asked by Simen on September 5, 2021

Edit 2: The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:

S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"] + reg_term))

Edit 1: I see that S[f"dW{l}"] contains some negative values. How is this possible when np.square(gradients[f"dW{l}"] always contains positive values?

I have implemented a neural network from scratch which uses mini-batch gradient descent. The network works well. Unfortunately, I can’t get my RMSprop implementation to work. I have verified that the network works well with momentum.

I get a RuntimeWarning when training the network with RMSprop: "invalid value encountered in sqrt". This happens in the "RMSprop" update step.

My implementation of update parameters:

def update_parameters(parameters, gradients, V, S, batch_size, t, learning_rate, reg_param):
    L = len(parameters) // 2
    beta1 = 0.9
    beta2 = 0.999
    epsilon = 1e-8
    
    for l in range(1, L+1):
        reg_term = (reg_param / batch_size) * parameters[f"W{l}"]

        # RMSprop gradients
        S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"]) + reg_term)
        S[f"db{l}"] = beta2 * S[f"db{l}"] + (1 - beta2) * np.square(gradients[f"db{l}"])
        
        # RMSprop update
        parameters[f"W{l}"] -= learning_rate * (gradients[f"dW{l}"] / (np.sqrt(S[f"dW{l}"])) + epsilon)
        parameters[f"b{l}"] -= learning_rate * (gradients[f"db{l}"] / (np.sqrt(S[f"db{l}"])) + epsilon)

This is how I initialize the parameters:

def init_params_V_and_S(activation_layers):
    params = {}
    V = {}
    S = {}
    L = len(activation_layers)
    
    for l in range(1, L):
        params[f"W{l}"] = np.random.randn(activation_layers[l], activation_layers[l-1]) * np.sqrt(2 / activation_layers[l-1])
        params[f"b{l}"] = np.zeros((activation_layers[l], 1))

        # RMSprop params
        S[f"dW{l}"] = np.zeros((activation_layers[l], activation_layers[l-1]))
        S[f"db{l}"] = np.zeros((activation_layers[l], 1))

    return params, V, S

Any ideas what’s causing this?

One Answer

The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this:

S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * np.square(gradients[f"dW{l}"] + reg_term)

Answered by Simen on September 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP