In a neural network, is it possible to gradient descent with more than one input?

Question

I went through a few tutorials, examples recently, and all (not sure if just for demonstration purposes) done gradient descent for one input. To get a deep understanding of backpropagation, I wrote a program to do backpropagation just do understand it more deeply. In linear/logistic regression, it makes sense to do gradient descent on the average of the costs through multiple inputs, and outputs, because the there's just one layer of weights, and the inputs directly affect the outputs.
In case of neural networks, we get back a layer of activations (outputs), and we have the expected outputs with matching shape, so we're getting the costs, by subtracting the expected output, with our actual output, and we're propagating this back with the chain rule. But this way we have to compare our costs with the activations of the neurons, which are unique, and dependent on the inputs. So even if we would take a bunch of inputs, get their costs, and average the costs, how could we decide which neuron activation layer should it be compared to?

Mayank Mishra · Accepted Answer

As to what I understand from your question, yes you can have multiple features (variables/input) and use gradient descent to minimize the loss function which has the input variables. Here is an article that gives you a basic vector calculus idea as to what exactly does a gradient is and what it does in the algorithm!
https://towardsdatascience.com/wondering-why-do-you-subtract-gradient-in-a-gradient-descent-algorithm-9b5aabdf8150

10xAI · Answer

We calculate the Loss only once per batch(average for each data point) and only at the output layer not at every Neuron.
$hspace{5cm}$ Loss = Loss_fn(y_true, y_pred)
In other words, this Loss_fn is a function of all the weight/bias and we want to know the Gradient w.r.t each weight/bias.
We can't get this directly for each parameter as each layer is dependent on the previous layer.
Output is something similar to - f( g( h( k(x) ) ) )
 This is where you apply Chain rule to get the Gradient till the first layer in a backward fashion.
$hspace{2cm}$
$hspace{3cm}$ [Image Credit - https://leonardoaraujosantos.gitbook.io/]
For the last layer,
We can get the Gradient directly as for this layer we have got both the output error and input(output of the last layer from forward pass)

In a neural network, is it possible to gradient descent with more than one input?

2 Answers

Add your own answers!

Ask a Question