Cross Validated Asked by Pinocchio on December 9, 2020
Say I want to estimate the test error. I can either get $N$ batch $B_i$ then take the average of their average error (so the R.V. is the mean):
$$ frac{1}{N} sum^N_{n=1} mu(B)$$
or I can take collect the errors and the take a massive average (so the R.V. is the loss):
$$ frac{1}{NB} sum^{NB}_{i=1} L(z_i) $$
I’m fairly certain that they both have the same error, but do they have the same std? From my numerical experiments I don’t think they do (which the first one being superior to the second one, especially as B gets larger):
Error with average of averages
80%|████████ | 4/5 [01:12<00:18, 18.14s/it]
-> err = 12.598432122111321 +-1.7049395893844483
Error with sum of everything
80%|████████ | 4/5 [01:11<00:17, 17.77s/it]
-> err = 11.505614456176758 +-13.968025155156523
what is the difference? Is the covariance some how affecting things, if yes how?
I think I understand that I can just make the batch size super big instead of taking lots of averages but now I am just annoyed that I don’t understand the difference between these too. I don’t think there should be a difference and if there is a difference WHEN does it happen?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP