I came across a post via r/Bayes about different ways to run Bayesian hierarchical linear models in R, a topic I talked about recently at a two day workshop on using R for epidemiology. Rasmus Bååth‘s post details the use of JAGS with rjags, STAN with rstan and LaplacesDemon.

JAGS (well, rjags) has been the staple for most of my hierarchical linear modelling needs over the last few years. It runs within R easily, is written in C++ (so is relatively fast), spits out something that the coda package can work with quite easily, and, above all, makes it very easy to specify models and priors. Using JAGS means never having to derive a Gibbs sampler or write out a Metropolis-Hastings algorithm that requires to you to really think about jumping rules. It’s Bayesian statistics for those who don’t have the time/inclination to do it “properly”. It has a few drawbacks, though, such as not being able to specify improper priors (but this could be seen as a feature rather than a bug) with distributions like dflat() and defining a Conditional Autoregressive prior requires specifying it as a multivariate Gaussian. That said, it’s far quicker than using OpenBUGS and JAGS installs fine on any platform.

After reading the post’s section on STAN I decided that it was time to give it another go. Downloading the latest version of R and Rtools would surely give me a better experience than last time where it wouldn’t even detect the compiler properly. Putting everything in DOS-friendly file structures with short names meant that everything went off without a hitch and I was able to get the toy example running. Andrew Gelman, one of the developers of STAN, has a post on his blog by Phillip Price about the eight schools example, a really introduction to hierarchical linear modelling and meta-analysis. STAN is a bit more forgiving than JAGS when it comes to priors; any stochastic node that isn’t given a prior is given a flat prior by default. Whether or not Thiago Martins and Dan Simpson would be happy with that remains to be seen. STAN looks very promising, and it’s been included in the third edition of Gelman’s BDA book (which I still need to buy).

The other strategy I tried recently was coding up a Metropolis-Hastings sampler using the guide from Florian Hartig at Theoretical Ecology. Choosing a jumping rule was difficult, as I had different parameters to deal with and a single jumping rule wouldn’t do. I tried adaptive MCMC and ended up going down the rabbit hole of log-precisions, block-updates and ended up with very poor mixing and convergence. Finding a decent jumping rule is probably what prevents me from going back to using adaptive MCMC as I did for a book chapter on Bayesian splines.

I eventually settled on writing out a full Gibbs scheme and coding it up in MATLAB. This is very fast (MATLAB’s better at loops than R is) and gives me good convergence. I’m not a fan of MATLAB’s plotting, though, so may end up importing the results into R so I’ve got ggplot2 handy. Big thanks to Zoé van Havre for her help with the Gibbs scheme.

I’ve got a PhD student who’s going to be dealing with Bayesian modelling. He’s picking up R quite quickly and is doing his best with Bayesian statistics. It’s all in WinBUGS at the moment, though, which is going to limit the amount of progress we can make. I’d love to be able to code up a bunch of JAGS models and let them run on the supercomputer once we get our great big data set ready for a well-planned set of analyses. I’ve got less time to do the modelling myself these days and find myself wishing I had a clone to do the work. I guess that’s part of the training aspect of PhD supervision, making sure your student can do the implementation when you describe a piece of analysis that you propose.

It’s still difficult for me working in a science group as the only statistician, as most of my statistics discussions are people asking for my help rather than us collaborating as equals. I enjoy working with others on interesting modelling problems, and it’d be good to work with other statisticians. While I’m now in the Mathematical Sciences School I don’t think I’ve capitalised yet on the connections I’ve got there in terms of directing my own research down the statistics route. With the UPTECH analysis being the major focus of my research at the moment, it’s tricky to allocate brain space to what I want to be doing next.

Dan SimpsonAll I have to say about flat priors is that the idea that they need to be justified any less carefully than “non-flat” priors is dumb. It doesn’t mean you shouldn’t use them, just that they make terrible defaults because people equate flat with “non-informative” which is demonstratively and spectacularly untrue.

I do however love the idea of STAN (although I’ve never been able to use it on any of my things, but the day will surely come…)

For the types of models you’ve been working on with your air quality stuff, I would point you towards Håvard and Leo’s book: they’re one block sampler works really well. One of the problems with that type of model is that there is severe posterior dependence between level 2 and 3 of the hierarchy, which tends to tank Gibbs samplers. One block updating schemes fix that problem. There’s some experimental stuff in STAN that’s doing a similar thing (http://arxiv.org/abs/1312.0906 is the corresponding paper, which gives a good illustration of why things tend to go horribly wrong). Re-parameterisations (like non-centred parameterisations) also tend to work, but can be tricky to deploy the right way.