TransWikia.com

Making sure the p-values of my OLS estimates are correct

Economics Asked on May 6, 2021

I have learned the basics of the Classical Linear Regression Model and also various diagnostic tests to check if the assumptions of the CLRM are met, such as homoskedadticity, absence of near perfect multicollinearity, normality of residuals etc.

I wanted to know if it is possible to have an exhaustive set of things I need to test to be reasonably confident about the p-values I get, both overall and for each specific regressor’s coefficient (in a multiple regression with cross sectional data).

For example, what are the things that a researcher would test to make sure that the p-values are technically correct (or at least acceptable) and publishable in an academic journal (not talking about the usefulness of the topic or whether there is causality etc, just the statistical inference, if that is acceptable or not)? Different textbooks/ lectures I have seen have put differing amounts of emphasis on different things. I wanted to know what would a professional researcher do?

One Answer

The answer to your question:

I wanted to know if it is possible to have an exhaustive set of things I need to test to be reasonably confident about the p-values I get, both overall and for each specific regressor's coefficient

is "no", but for a different reason than is stated in the comments.

To start this, let me ground you in what a p-value is and what it is not. First, it is not a probability statement. It is a conditional frequency statement. It is conditional on a utility function and a model of the world. If you have a bad model of the world, then your p-values are suspect.

Frequentist tests, as opposed to Bayesian tests, attempt to show that some null hypothesis is false. It does so by assuming that the null is true. That is not a trivial statement for two reasons. First, the null hypothesis acts as information. You are conditioning all the observations on that assumptions, as if true. If you would change your null, you would change the inference on your assumptions. Second, there is no way to distinguish a chance effect from a false null.

A p-value is a statement that if you would repeat your experiment, possibly an infinite number of times, then the frequency of seeing the result you saw or a result that is more extreme is no more common than the stated p-value if the null model is the true model.

So, let us look at the things you need to get right for a p-value to imply its abstract meaning.

First, your utility function needs to be correct. Least squares models assume that losses are quadratic, that is to say $$U(hat{theta},theta)=-c(hat{theta}-theta)^2.$$ If you change your model, then you change your estimator, which implies that your choice of p-values changes.

For example, if you would assume that $U(hat{theta},theta)=-c|hat{theta}-theta|$, then you would end up with median estimators and would use things such as Theil's or Quantile regression. The results are more robust but less powerful.

Second, you need to get your specification correct. P-values assume the null model is true, if you have a misspecified model, then your p-values will be suspect, although not automatically so far from reality as to discard them.

Third, you need to match the correct assumptions to the real null model. This is the danger in testing for heteroskedasticity. If you have heteroskedasticity, but it fails to reject the null, then you have a bad set of assumptions. If you do not have heteroskedasticity, but it rejects the null, then you have a bad set of assumptions.

The actual fix for problems like that is to approach the assumptions logically. That is why there is no single book or reference set for this problem.

Diamond prices are heteroskedastic in size but in a weird way. Simple logic would tell you that they are probably heteroskedastic with respect to size.

Very small diamonds have only two purposes, industrial uses such as drilling, and as additions to jewelry. As jewelry, their price is very high. As an industrial material, the price is very low.

Most medium-sized diamonds are either close to their desired size for jewelry or need cut into very small pieces for industrial use. However, the bulk of their use will be to make medium-sized diamonds cut to specification. It will be easier to estimate their value as is.

Very large diamonds might remain highly valued, very large diamonds. Think of the Hope Diamond as an example. Others may make hundreds of small diamonds for industrial use.

The result is that you get a bowtie version of heteroskedasticity. Small diamonds and very large diamonds have enormous price variability. Medium diamonds are quite predictable as to the final cut and value and so are priced in a much narrower range.

You should never test for heteroskedasticity with diamonds because there is a way to think this through. Testing should be reserved for things that you have no subject matter knowledge regarding.

Most p-values are robust because most Frequentist estimators are minimax estimators, that is to say, the estimator minimizes the maximum amount of risk that you must take under a specified utility function.

As a result, p-values tend to be conservative in their use of information and any resulting inference.

The reason that there is no one list is that if you can assert something about the population as true, such as diamond prices being logically heteroskedastic in size, then you can condition your model with that knowledge.

As a result, if you were to test diamond prices and they tested as homoskedastic, then you could ignore the test result as irrelevant.

There isn't a list. Instead, you need to determine what researchers in a narrow domain do.

Let me give you one more example to see why that matters.

If a brick-and-mortar retailer needs to alter its revenue, it can issue coupons. It has the ability to control its revenue and its sales volume.

Electrical utilities, generally, cannot control their volume or price. Even if they could mail coupons to use additional electricity, how would anyone monitor this sale? Electric utilities control their costs, not their revenue.

That discussion alone tells you that the accounting information that management sees as important for the marginal decision is different. If you give them a common specification, then your model will be misspecified. Your p-values are off. You omitted a variable, industry type.

Finding correct p-values are about more than overcoming technical obstacles, such as multicollinearity and heteroskedasticity, it is about understanding your domain of research.

Fortunately, p-values are usually robust. If you set $alpha=.05$ and $p<.01$ then you are probably safe. On the other hand, if $p<.04999$ and your work is a bit shaky, then do not feel confident even if you followed the technical rules.

There is no one book, no simple list. There are just domains where researchers discover unexpected things and correct for them.

Correct answer by Dave Harris on May 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP