Data Science Asked by Josh Lowe on April 11, 2021
I’m very new to neural networks and have recently learnt about the loss functions used with neural networks.
This question is in regards to the mean square error metric, defined as (from the textbook I’m using):
$(frac{1}{n})sum_{i=1}^{n}(h_{theta}(x^{i}) – y^{i})^{2}$
Where $h_{theta}(x^{i})$ gives the predicted value for $x^{i}$ with model’s weights $theta$ and $y^{i}$ represents the actual prediction for the data point at index $i$.
Looking up about this function online, I’ve seen different sources say different things. I can’t seem to work out what n actually represents.
I understood it as representing the number of neurons in the output layer and hence you’d just be finding the difference between the actual neuron value and the predicted value of the network given the weights.
Some of the sources say it represents the number of training samples. If this is the case however, what does $h_{theta}(x^{i})$ represent? Is it a sum of the output neuron values itself? Also if n is this, wouldn’t that mean you’d have to run the function many times over all the training samples to minimize it? Whereas with the previous understanding of n, you could run it on certain samples and not all of them.
The idea of mean squared error is find the mean value of the squared errors. Therefore, you divide by the number of squared errors you add up, which is the number of samples.
In more inference-oriented applications (e.g. linear regression and ordinary least squares), you may see the denominator given as $n-k$ or $n-p$, where $k$ and $p$ and the number of parameters in the regression. This has to do with how MSE is an unbiased estimator of conditional variance, an issue unlikely to interest you in neural networks that do pure predictive modeling, but I do not want you to get confused about what’s going on when you see that.
Answered by Dave on April 11, 2021
First, try to understand a few points -
Output Neuron value and the prediction both are the same things. In the case of Classification, we convert the output probability to Class based on a Threshold.
MSE is used in Regression and In a regression problem, you mostly have one output Neuron e.g. Price. You may have more if you want to club multiple targets e.g. Bounding Box problem
The "N" in the denominator is the number of different errors calculated which is basically the number of samples in context.
With "In context", I meant if it's at the time of backpropagation it is the batch_size. if it's at the end of epoch/training it's the whole training dataset.
what does $h_θ(x^i)$ represent?
It is the prediction(value of output Neuron) for $i_{th}$ sample.
Answered by 10xAI on April 11, 2021
$h_{theta}$ is an hypothesis function which is parameterized by $theta$. i.e for differetn value of $theta$ you get a different hypothesis function.
$h_{theta}(x^{i})$ Calucates the value of the hypothesis function pararameterzied by a certain value $theta$ on the input $x^i$. This is also called predicted output.
$sum_{i=1}^{n}(h_{theta}(x^{i}) - y^{i})^{2}$ Here we are fixing certain value of $theta$ (also called weights) and calculate the output of the hypothesis function fo each sample $x^i$ (also called predicted output). We then take its corresponding ground truth $y^i$ and take squared difference. We do it for all the $n$ samples and sum them up.
Answered by mujjiga on April 11, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP