My students are working on their 25% assessment pieces, the Quantitative Workbook. These are group assignments that require students do a quantitative analysis from start to finish on some ecology data we’ve given them. A few students are struggling with the p value concept, particularly what it means in the R summary.lm() output. I responded to the student with the following statement. It’s a bit more verbose than I might have liked but I think it’s important to try to step it through from start to finish. It took me ages to get this as an undergrad.

The hypothesis test that R does and gives you in the regression summary asks:

What is the probability of seeing a test statistic (third column in the output) at least as extreme as what we have if the true value of the parameter were actually zero (this is our null hypothesis)?

Our best estimates of the parameters given the data we are using with our model (first column in the output) are found by minimising the sum of squares of the errors between the observed values and the fitted values (see the Normal equations slides from the linear algebra week). Our uncertainty about those estimates is given to us with the standard error of the estimate (second column in the output) which is related to the size of the standard deviation of the residuals. More uncertainty in our fitted values reflects uncertainty in our parameter estimates. If the standard error is comparable in size to the estimate, then perhaps our uncertainty may mean we can’t reject the idea that the true value of the parameter is zero (i.e. we may not be able to detect that this variable has an effect).

The test statistic (third column) is assumed to come from a t distribution whose degrees of freedom is the number of data points we started with minus the number of parameters we’ve observed. The idea of the test statistic coming from a t distribution reflects the notion that our data is a finite sample of all the data that could have been collected if the experiment were repeated an infinite number of times under the same conditions. If the test statistic is really far away from zero, then it’s very improbable that we would observe sampled data like this if the true value of this parameter were zero (i.e. the relevant variable plays no role in explaining the variation in the response variable).

It’s traditional in science to use a cutoff for the p value of 0.05, corresponding to whether a 95% confidence interval covers zero. This is saying “we accept that in 1 out of every 20 identically conducted experiments we may see no observable effect, the rest of the time we see it”. If your p value, the probability of seeing a test statistic at least as extreme as this if the true value of the parameter is zero, is less than 0.05 then you’ve got evidence to reject the null hypothesis. Sometimes we want to be really confident and we choose a cutoff of 0.01, corresponding to whether a 99% CI covers zero. If the p value is less than 0.01 (where only at most 1 in 100 experiments show us a zero effect) then we have evidence to reject the null hypothesis at our 0.01 level. Sometimes we will accept a less confident cutoff of 0.1 (1 in 10 experiments). Whatever level we choose must be stated up front.

So in summary the hypothesis we are testing is “The true value of the parameter is zero”, the p value is a probabilistic statement that says “If I assume the true value is zero, what’s the probability of seeing a test statistic (that represents how uncertain I am about my estimate) at least as big as this?”