TransWikia.com

Number of samples and R^2

Cross Validated Asked by ss_19 on November 16, 2021

I have a linear regression model that gives me lower R^2 values as I increase the number of samples. The highest R^2 value I get is ~0.5, which makes me doubt that this is a problem related to overfitting. What would be an explanation for this observation?

One Answer

If what you're adding is totally random, not coming from a certain mechanism that also generates your model, then this is not unexpected.

You are introducing noise to your model, which makes it performs worse.


If, however, in the other hand, the added samples are coming from the same mechanism that generates your model i.e. not made up, this will indicate something wrong.

  1. The relationship between predictor and response is not that good, but you get lucky subset from the data that shows as if it is good.

  2. There is a different relationship in the first sample and the later-added sample. This can happen if the two sample subsets are taken in different time or condition.

Answered by Nuclear03020704 on November 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP