multilayer perceptron do not converge

Question

I have been coding my own multi layer perceptron in MATLAB and it can be compiled without error. My training data features,x, has values from 1 to 360, and training data output, y, has the value of sin(x).

The thing is my MLP only decreases the cost for the first few iterations and will get stuck at 0.5. I have tried including momentum but it doesn't help and increasing the layers or increasing the neurons do not help at all. I am not sure why this is happening ....

I have uploaded the files for your reference here.

The summary of my code is
1) I normalize my input data either using min-max or zscore

2) Initialize random weights and bias within the range of -1 to 1

for i = 1:length(nodesateachlayer)-1    
    weights{i} = 2*rand(nodesateachlayer(i),nodesateachlayer(i+1))-1; 
    bias{i} = 2*rand(nodesateachlayer(i+1),1)-1; 
end

3) then I do a forward pass where input is multiplied by weights and added with bias then activated by a transfer function (sigmoid)

for i = 2:length(nodesateachlayer)
        stored{i} = nactivate(bsxfun(@plus,(weights{i-1}'*stored{i-1}),bias{i-1}),activation);    
    end

4) then calculate the error then do a backward pass

dedp = 1/length(normy)*error;
    for i = length(stored)-1:-1:1
        dpds = derivative(stored{i+1},activation);
        deds = dpds'.*dedp;
        dedw = stored{i}*deds; 
        dedb = ones(1,rowno)*deds;
        dedp = (weights{i}*deds')';
        weights{i}=weights{i}-rate.*dedw;
        bias{i}=bsxfun(@minus,bias{i},rate.*dedb');
    end

5) I have the cost plotted out at every iteration to see the descent

I assume there is something wrong with the code so where could the error possibly lies in

user12075 · Answer

My view on your question, is that tiny networks seldom work. The above method uses a Neural Network to learn the function $y=sin(x)$. Although this problems seems simple, it cannot be expected to be solved by a really tiny network (the above model uses a 5-layer MLP with hidden size [5,6,7], which is small).
Even if back-propagation is implemented correctly, would the model learn anything? No. I suppose Tensorflow implemented back-propagation correctly, here is the result using Tensorflow:

You see, it learns almost nothing. In fact, the MSE loss is very close to 0.5 as stated above.
My suggestion is to try a 3 layer MLP with hidden size 256. Here is the result:

You can see it's much better. MSE<0.1 now.
------------------code---------------------

x_ =np.atleast_2d(np.arange(0,360,1)).T
y_ = np.atleast_2d(np.sin(x_/180*np.pi))
g = tf.Graph()
with g.as_default():
    with tf.variable_scope("mlp"):
        input_x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
        input_y = tf.placeholder(shape=[None,1], dtype=tf.float32)
        layer1 = tf.layers.dense(inputs=input_x, units=256, activation=tf.nn.sigmoid)
        #layer2 = tf.layers.dense(inputs=input_x, units=6, activation=tf.nn.sigmoid)
        #layer3 = tf.layers.dense(inputs=input_x, units=7, activation=tf.nn.sigmoid)
        output_y = tf.layers.dense(inputs=layer1, units=1) # inputs=layer1
        loss = tf.losses.mean_squared_error(input_y, output_y)
        train_op = tf.train.AdagradOptimizer(0.01).minimize(loss)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(300):
            _, loss_ = sess.run((train_op, loss), feed_dict={input_x:x_, input_y:y_})
        y_hat_ = sess.run(output_y, feed_dict={input_x:x_, input_y:y_})
        print(loss_, end='t')
plt.plot(x_,y_, 'g', x_,y_hat_,'b')
plt.legend(['ground truth', 'predicted'])

multilayer perceptron do not converge

One Answer

Add your own answers!

Ask a Question