Data Science Asked by indigo on March 11, 2021
Here is my code for Backpropagation weight updating. It’s a simple network with 1 hidden layer and 1 output neuron. The activation function of both hidden and output layer uses tanh. I propagate the error using the gamma variables in the code.
The problem is I get very small change in the average error after the first iteration. I’m using initial weights randomly between -0.1 and 0.1 and learning rate of 0.03. Inputs are roughly between -0.001 and 0.001. Target output is between -0.5 and 0.5. I just want to check if the code is correct. I’ve tried higher values for learning rate and initial weights and the error actually goes up and settles around 1.0. Now is the code wrong or do I just need to fine tune the initial weights and learning rate? Many thanks for your replies!
ERROR = 0.5 * MathPow( (TARGET - output), 2 );
double gamma = ERROR * ( 1 - ( output * output ) );
// HIDDEN-OUTPUT LAYER UPDATE...
for ( int i=0; i<HIDDEN_LAYER_SIZE; i++ )
{
double gamma_hi = gamma * HIDDEN_OUTPUT_WEIGHTS[ i ];
HIDDEN_GAMMAS[ i ] = gamma_hi;
double hi_weight_delta = LEARNING_RATE * gamma_hi * HIDDEN_OUTPUTS[ i ];
HIDDEN_OUTPUT_WEIGHTS[ i ] += hi_weight_delta;
}
// INPUT-HIDDEN LAYER UPDATE
for ( int i=0; i<HIDDEN_LAYER_SIZE; i++ )
{
for ( int j=0; j<INPUT_LAYER_SIZE; j++ )
{
double gamma_ij = HIDDEN_GAMMAS[ i ] * INPUT_HIDDEN_WEIGHTS[ i ][ j ];
double ij_weight_delta = LEARNING_RATE * gamma_ij * INPUT_LAYER_INPUTS[ j ];
INPUT_HIDDEN_WEIGHTS[ i ][ j ] += ij_weight_delta;
}
}
// OUTPUT CODE:
// Calculate value of each hidden...
for ( int i=0; i<HIDDEN_LAYER_SIZE; i++ )
{
double hi = 0.0;
for ( int j=0; j<INPUT_LAYER_SIZE; j++ )
{
hi += INPUT_LAYER_INPUTS[ i ]* HIDDEN_OUTPUT_WEIGHTS[ i ][ j ];
}
HIDDEN_OUTPUTS[ i ] = TANH( hi );
}
double output = 0.0;
// Calculate value of the ouput
for ( int i=0; i<HIDDEN_LAYER_SIZE; i++ )
{
output += HIDDEN_OUTPUTS[ i ] * HIDDEN_OUTPUT_WEIGHTS[ i ];
}
output = TANH( output );
It looks like you're adding the delta to the weights instead of subtracting it. Gradient descent is given by the following calculation run for some iterations:
$$ x^{current} = x^{previous} - alpha frac{dy}{dx} $$
where $alpha$ is the learning rate. We subtract the value because derivatives go in the direction of steepest ascent. So the following lines:
HIDDEN_OUTPUT_WEIGHTS[ i ] += hi_weight_delta;
INPUT_HIDDEN_WEIGHTS[ i ][ j ] += ij_weight_delta;
should be
HIDDEN_OUTPUT_WEIGHTS[ i ] -= hi_weight_delta;
INPUT_HIDDEN_WEIGHTS[ i ][ j ] -= ij_weight_delta;
Additionally, I'm not sure what the architecture of your network is, but it doesn't look like you have bias units- consider adding them so that your neurons can have non-zero centered activations. Not having them may impact your network's ability to learn.
Answered by Shan S on March 11, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP