# multilayer perceptron do not converge

Data Science Asked by userFarkill on January 4, 2021

I have been coding my own multi layer perceptron in MATLAB and it can be compiled without error. My training data features,x, has values from 1 to 360, and training data output, y, has the value of sin(x).

The thing is my MLP only decreases the cost for the first few iterations and will get stuck at 0.5. I have tried including momentum but it doesn’t help and increasing the layers or increasing the neurons do not help at all. I am not sure why this is happening ….

The summary of my code is
1) I normalize my input data either using min-max or zscore

2) Initialize random weights and bias within the range of -1 to 1

for i = 1:length(nodesateachlayer)-1
weights{i} = 2*rand(nodesateachlayer(i),nodesateachlayer(i+1))-1;
bias{i} = 2*rand(nodesateachlayer(i+1),1)-1;
end


3) then I do a forward pass where input is multiplied by weights and added with bias then activated by a transfer function (sigmoid)

for i = 2:length(nodesateachlayer)
stored{i} = nactivate(bsxfun(@plus,(weights{i-1}'*stored{i-1}),bias{i-1}),activation);
end


4) then calculate the error then do a backward pass

dedp = 1/length(normy)*error;
for i = length(stored)-1:-1:1
dpds = derivative(stored{i+1},activation);
deds = dpds'.*dedp;
dedw = stored{i}*deds;
dedb = ones(1,rowno)*deds;
dedp = (weights{i}*deds')';
weights{i}=weights{i}-rate.*dedw;
bias{i}=bsxfun(@minus,bias{i},rate.*dedb');
end


5) I have the cost plotted out at every iteration to see the descent

I assume there is something wrong with the code so where could the error possibly lies in

My view on your question, is that tiny networks seldom work. The above method uses a Neural Network to learn the function $$y=sin(x)$$. Although this problems seems simple, it cannot be expected to be solved by a really tiny network (the above model uses a 5-layer MLP with hidden size [5,6,7], which is small).

Even if back-propagation is implemented correctly, would the model learn anything? No. I suppose Tensorflow implemented back-propagation correctly, here is the result using Tensorflow:

You see, it learns almost nothing. In fact, the MSE loss is very close to 0.5 as stated above.

My suggestion is to try a 3 layer MLP with hidden size 256. Here is the result:

You can see it's much better. MSE<0.1 now.

------------------code---------------------

x_ =np.atleast_2d(np.arange(0,360,1)).T
y_ = np.atleast_2d(np.sin(x_/180*np.pi))
g = tf.Graph()
with g.as_default():
with tf.variable_scope("mlp"):
input_x = tf.placeholder(shape=[None, 1], dtype=tf.float32)
input_y = tf.placeholder(shape=[None,1], dtype=tf.float32)
layer1 = tf.layers.dense(inputs=input_x, units=256, activation=tf.nn.sigmoid)
#layer2 = tf.layers.dense(inputs=input_x, units=6, activation=tf.nn.sigmoid)
#layer3 = tf.layers.dense(inputs=input_x, units=7, activation=tf.nn.sigmoid)
output_y = tf.layers.dense(inputs=layer1, units=1) # inputs=layer1
loss = tf.losses.mean_squared_error(input_y, output_y)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(300):
_, loss_ = sess.run((train_op, loss), feed_dict={input_x:x_, input_y:y_})
y_hat_ = sess.run(output_y, feed_dict={input_x:x_, input_y:y_})
print(loss_, end='t')
plt.plot(x_,y_, 'g', x_,y_hat_,'b')
plt.legend(['ground truth', 'predicted'])


Answered by user12075 on January 4, 2021