The problem with p values

A coworker sent me this article about alternatives to the default 0.05 p value in hypothesis testing as a way to improve the corpus of published articles so that we can actually expect reproducability and have a bit more faith that these results are meaningful. The article is based on a paper published in the Proceedings of the National Academy of Sciences which talks about mapping Bayes Factors to p values for hypothesis tests so that there’s a way to think about the strength of the evidence.

The more I do and teach statistics the more I detest frequentist hypothesis testing (including whether a regression coefficient is zero) as a means of describing whether or not something plays a “significant” role in explaining some physical phenomenon. In fact, the entire idea of statistical significance sits ill with me because the way we tend to view it is that 0.051 is not significant and 0.049 is significant, even though there’s only a very small difference between the two. I guess if you’re dealing with cutoffs you’ve got to put the cutoff somewhere, but turning something which by its very nature deals with uncertainty into a set of rigid rules about what’s significant and what’s not seems pretty stupid.

My distaste for frequentist methods means that even for simple linear regressions I’ll fire up JAGS in R and fit a Bayesian model because I fundamentally disagree with the idea of an unknown but fixed true parameter. Further to this, the nuances of p values being distributed uniformly under the Null hypothesis means that we can very quickly make incorrect statements.

I agree with the author of the article that shifting hypothesis testing p value goal posts won’t achieve what we want and I’ll have a bit closer a read of the paper. For the time being, I’ll continue to just mull this over and grumble when people say “statistically significant” without any reference to a significance level.

NB: this post has been in an unfinished state since last November, when the paper started getting media coverage.

Revising another paper

We got a paper back from the reviewers a few days ago and there are some comments requesting revisions to the explanation of the statistical methods and the analysis. It’s interesting coming back to this paper, about a year after I last saw it (it’s been sent around to a few different journals to try to find a home for it). The PhD student who is the main author got into R and ggplot2 last year and has done some good work with linear mixed effects models and visualisation but some of the plots are the same sort of thing one might do in Excel (lots of boxplots next to each other rather than making use of small multiples).

So now I get to delve back into some data and analysis that’s about a year old with fresh eyes. Having done more with ggplot2 over the last 12 months, there are some things that I will definitely change about this. The student and I had a chat this morning about how to tackle it, and we’re trying to choose the best way to split up our data for visualisation: 6 treatments, 4 measurement blocks, two different measures (PM2.5 mass concentration and PNC), a total of 48 boxplots, density plots or histograms.

A paper with another PhD student has had its open discussion finalised now, which means more writing is probably going to happen. I find ACP‘s model quite interesting. The articles are peer reviewed, published for discussion, and then revised by the authors for final publication. I guess it spreads the review work out a bit and allows for multiple voices to be heard before final publication, each with a different approach and background.

That feeling when former students contact you

Last year I had a student in SEB113 who came in to the subject with a distaste for mathematics and statistics; they struggled with both the statistical concepts and the use of R throughout the semester and looked as though they would rather be anywhere else during the collaborative workshops. This student made it to every lecture and workshop though and came to enjoy the work of using R for statistical analysis of data; and earned a 7 in the unit.

I just got an email from them asking for a reference for their VRES (Vacation Research Experience Scheme) project application. Not only am I proud of this student for working their butt off to get a 7 in a subject they disliked but came to find interesting, but I am over the moon to hear that they are interested in undertaking scientific field research. This student mentions how my “passion for teaching completely transformed my (their) view of statistics”, and their passion for the research topic is reflected in the email.

This sort of stuff is probably the most rewarding aspect of lecturing.

Lotka-Volterra and Bayesian statistics and teaching

One of the standard population dynamics models that I learned in my undergrad mathematical modelling units was the Lotka-Volterra equations. These represent a very simple set of assumptions about populations, and while they don’t necessarily give physically realistic population trajectories they’re an interesting introduction to the idea that differential equations systems don’t necessarily have an explicit solution.

The assumptions are essentially: prey grow exponentially in the absence of predators, predation happens at a rate proportional to the product of the predator and prey populations, birth of predators is dependent on the product of predator and prey populations, predators die off exponentially in the absence of prey. In SEB113 we cover non-linear regressions, the mathematical models that lead to them, and then show that mathematical models don’t always yield a nice function. We look at equilibrium solutions and then show that we orbit around it rather than tending towards (or away from) it. We also look at what happens to the trajectories as we change the relative size of the rate parameters.

Last time we did the topic, I posted about using the logistic growth model for our Problem Solving Task and it was pointed out to me that the model has a closed form solution, so we don’t explicitly need to use a numerical solution method. This time around I’ve been playing with using Euler’s method inside JAGS to fit the Lotka-Volterra system to some simulated data from sinusoidal functions (with the same period). I’ve put a bit more effort into the predictive side of the model, though. After obtaining posterior distributions for the parameters (and initial values) I generate simulations with lsode in R, where the parameter values are sampled from the posteriors. The figure below shows the median and 95% CI for the posterior predictive populations as well as points showing the simulated data.

lvThe predictions get more variable as time goes on, as the uncertainty in the parameter values changes the period of the cycles that the Lotka-Volterra system exhibits. This reminds me of a chat I was having with a statistics PhD student earlier this week about sensitivity of models to data. The student’s context is clustering of data using overfitted mixtures, but I ended up digressing and talking about Edward Lorenz’s discovery of chaos theory through a meteorological model that was very sensitive to small changes in parameter values. The variability in the parameter values in the posterior give rise to the same behaviour, as both Lorenz’s work and my little example in JAGS involve variation in input values for deterministic modelling. Mine was deliberate, though, so isn’t as exciting or groundbreaking a discovery as Lorenz but we both come to the same conclusion: forecasting is of limited use when your model is sensitive to small variations in parameters. As time goes on, my credible intervals will likely end up being centred on the equilibrium solution and the uncertainty in the period of the solution (due to changing coefficient ratios) will result in very wide credible intervals.

It’s been a fun little experiment again, and I’m getting more and more interested in combining statistics and differential equations, as it’s a blend of pretty much all of my prior study. The next step would be to use something like MATLAB with a custom Gibbs/Metropolis-Hastings scheme to bring in more of the computational mathematics I took. It’d be interesting to see if there’s space for this sort of modelling in the Mathematical Sciences School’s teaching programs as it combines some topics that aren’t typically taught together. I’ve heard murmurings of further computational statistics classes but haven’t been involved with any planning.

Paper helicopters

There is no textbook for SEB113 – Quantitative Methods in Science.

It’s not that we haven’t bothered to prescribe one, it’s that no one seemed to be taking the same approach we decided upon two years ago when the planning for the unit started. There are books on statistics for chemistry, statistics for ecology, statistics for physics, statistics for mathematics, etc. but trying to find a general “statistics for science” book that focuses on modelling rather than testing has been difficult.

That said, there are some amazing resources out there if you know where to look, not just for learning statistics but for teaching statistics. One of the most useful that we’ve come across is “Teaching Statistics“, by Andrew Gelman and Deborah Nolan. The book itself is full of advice for things like groupwork, topic order, structure of learning activities, etc. but my favourite thing so far is the paper helicopter experiment. Continue reading

Running Bayesian models

I came across a post via r/Bayes about different ways to run Bayesian hierarchical linear models in R, a topic I talked about recently at a two day workshop on using R for epidemiology. Rasmus Bååth‘s post details the use of JAGS with rjags, STAN with rstan and LaplacesDemon.

JAGS (well, rjags) has been the staple for most of my hierarchical linear modelling needs over the last few years. It runs within R easily, is written in C++ (so is relatively fast), spits out something that the coda package can work with quite easily, and, above all, makes it very easy to specify models and priors. Using JAGS means never having to derive a Gibbs sampler or write out a Metropolis-Hastings algorithm that requires to you to really think about jumping rules. It’s Bayesian statistics for those who don’t have the time/inclination to do it “properly”. It has a few drawbacks, though, such as not being able to specify improper priors (but this could be seen as a feature rather than a bug) with distributions like dflat() and defining a Conditional Autoregressive prior requires specifying it as a multivariate Gaussian. That said, it’s far quicker than using OpenBUGS and JAGS installs fine on any platform. Continue reading

Posterior Samples

Interested in collaborative use of R, MATLAB, etc. for analysis and visualisation within a webpage? Combining plotly and iPython can help you with that.

Cosmopolitan (yes, that Cosmopolitan) has a great article interviewing Emily Graslie, Chief Curiosity Officer at the Field Museum in Chicago. She discusses being an artist and making the transition into science, science education and YouTube stardom.

A few of the PhD students in my lab have asked if I could run an introduction to R session. I’d mentioned the CAR workshop that I’d be doing but the cost had put them off. Luckily, there are alternatives like Datacamp, Coursera and Lynda. Coursera’s next round of “Data Science”, delivered by Johns Hopkins University, starts next Monday (Course 1 – R Programming). So get in there and learn some R! I’m considering recommending some of these Coursera courses to my current SEB113 students who want to go a bit further with R, but the approach that they take in these online modules is quite different to what we do in SEB113 and I don’t want them to confuse themselves.