Tag Archives: bayesian statistics


I had a very full week last week, with the annual Bayes on the Beach (BOB) at the Gold Coast (Mon-Wed) and Bayesian Optimal Design of Experiments  (BODE) on Friday.

BOB is an annual workshop/retreat, run by Kerrie Mengersen and the BRAG group at QUT, that brings together a bunch of Australian and international statisticians for a few days of workshops, tutorials, presentations and fun in the sun. This year was, I think, my fourth year at BOB.

One of the recurring features is the workshop sessions, where around three researchers each pose a problem to the group and everyone decides which one they’re going to work on. This year I was asked to present a problem based on the air quality research I do and so my little group worked on the issue of how to build a predictive model of indoor PM10 based on meteorology, outdoor PM10 and temporal information. We were fortunate to have Di Cook in our group, who did a lot of interesting visual analysis of the data (she later presented a tutorial on how to use ggplot and R Markdown). We ended up discussing why tree models may not be such a great idea, the difference in autocorrelation and the usefulness of distributed lag models. It gave me a lot to think about and I hope that everyone found it as valuable as I did.

The two other workshop groups worked on ranking the papers of Professor Richard Boys (one of the keynote speakers) and building a Bayesian Network model of PhD completion time. Both groups were better attended than mine, which I put down to the idea that those two were “fun” workshops and mine sounded a lot like work. Still, a diverse range of workshops means something for everyone.

James McGree (QUT) asked me if I could come to the BODE workshop to discuss some open challenges in air quality research with regards to experimental design. I gave a brief overview of regulatory monitoring, the UPTECH project’s random spatial selection and then brought in the idea that the introduction of low cost sensors gives us the opportunity to measure in so many places at once but we still need to sort out where we want to measure if we want to characterise human exposure to air pollution. While it was a small group I did get to have a good chat with the attendees about some possible ways forward. It was also good to see Julian Caley (AIMS) talk about monitoring on the Great Barrier Reef, Professor Tony Pettitt (QUT) talk about sampling for intractable likelihoods and Tristan Perez (QUT) discuss the interplay between experimental design and the use of robots.

It’s been a great end to the year to spend it in the company of statisticians working on all sorts of interesting problems. While I do enjoy my air quality work and R usage is increasing at ILAQH it’s an entirely different culture to being around people who spend their time working out whether they’re better off with data.table and reshape2 or dplyr and tidyr.

Marrying differential equations and regression

Professor Fabrizio Ruggeri (Milan) visited the Institute for Future Environments for a little while in late 2013. He has been appointed as Adjunct Professor to the Institute and gave a public talk with a brief overview of a few of his research interests. Stochastic modelling of physical systems is something I was exposed to in undergrad when a good friend of mine, Matt Begun (who it turns out is doing a PhD under Professor Guy Marks, with whom ILAQH collaborates), suggested we do a joint Honours project where we each tackled the same problem but from different points of view, me as a mathematical modeller, him as a Bayesian statistician. It didn’t eventuate but it had stuck in my mind as an interesting topic.

In SEB113 we go through some non-linear regression models and the mathematical models that give rise to them. Regression typically features a fixed equation and variable parameters and the mathematical modelling I’ve been exposed to features fixed parameters (elicited from lab experiments, previous studies, etc.) and numerical simulation of a differential equation to solve the system (as analytic methods aren’t always easy to employ). I found myself thinking “I wonder if there’s a way of doing both at once” and then shelved the thought because there was no way I would have the time to go and thoroughly research it.

Having spent a bit of time thinking about it, I’ve had a crack at solving an ODE within a Bayesian regression model (Euler’s method in JAGS) for logistic growth and the Lotka-Volterra equations. I’ve started having some discussions with other mathematicians about how we marry these two ideas and it looks like I’ll be able to start redeveloping my mathematical modelling knowledge.

This is somewhere I think applied statistics has a huge role to play in applied mathematical modelling. Mathematicians shouldn’t be constraining themselves to iterating over a grid of point estimates of parameters, then choosing the one which minimises some Lp-norm (at least not without something like Approximate Bayesian Computation).

I mean, why explore regions of the parameter space that are unlikely to yield simulations that match up with the data? If you’re going to simulate a bunch of simulations, it should be done with the aim of not just finding the most probable values but characterising uncertainty in the parameters. A grid of values representing a very structured form of non-random prior won’t give you that. Finding the maximum with some sort of gradient-based method will give you the most probable values but, again, doesn’t characterise uncertainty. Sometimes we don’t care about that uncertainty, but when we do we’re far better off using statistics and using it properly.

Two pieces of good news this week

The full paper from the EMAC2013 conference last year is now available online. If you’re interested in the statistical methodology we used for estimating the inhaled dose of particles by students in the UPTECH project, you should check out our paper at the ANZIAM Journal (click the link that says “PDF” down the bottom under Full Text).

More importantly, though, we were successful in applying for an ARC Discovery Project! This project will run for three years and combines spatio-temporal statistical modelling, sensor miniaturisation and mobile phone technologies to allow people to minimise their exposure to air pollution. Our summary of the project, from the list of successful projects:

This interdisciplinary project aims to develop a personalised air pollution exposure monitoring system, leveraging the ubiquitousness and advancements in mobile phone technology and state of the art miniaturisation of monitoring sensors, data transmission and analysis. Airborne pollution is one of the top contemporary risks faced by humans; however, at present individuals have no way to recognise that they are at risk or need to protect themselves. It is expected that the outcome will empower individuals to control and minimise their own exposures. This is expected to lead to significant national socioeconomic benefits and bring global advancement in acquiring and utilising environmental information.

Other people at ILAQH were also successful in getting a Discovery Project grant looking at new particle formation and cloud formation in the Great Barrier Reef. I won’t be involved in that project but it sounds fascinating.

Posterior samples

SEB113 students really seemed to enjoy looking at mathematical modelling last week. The Lotka-Volterra equations continue to be a good teaching tool. A student pointed out that when reviewing the limit idea for derivatives it’d be useful to show it with approximating the circumference of a circle using a polygon. So I knocked this up:


Are you interested in big data and/or air quality? Consider doing a PhD with me.

This week I showed in the workshop how Markov chains are a neat application of linear algebra for dealing with probability. We used this interactive visualisation to investigate what happens as the transition probabilities change.

Zoubin Ghahramani has written a really nice review paper of Bayesian non-parametrics that I really recommend checking out if you’re interested in the new modelling techniques that have been coming out in the last few years for complex data sets.

Exercism.io is a new service for learning how to master programming by getting feedback on exercises.

The problem with p values

A coworker sent me this article about alternatives to the default 0.05 p value in hypothesis testing as a way to improve the corpus of published articles so that we can actually expect reproducability and have a bit more faith that these results are meaningful. The article is based on a paper published in the Proceedings of the National Academy of Sciences which talks about mapping Bayes Factors to p values for hypothesis tests so that there’s a way to think about the strength of the evidence.

The more I do and teach statistics the more I detest frequentist hypothesis testing (including whether a regression coefficient is zero) as a means of describing whether or not something plays a “significant” role in explaining some physical phenomenon. In fact, the entire idea of statistical significance sits ill with me because the way we tend to view it is that 0.051 is not significant and 0.049 is significant, even though there’s only a very small difference between the two. I guess if you’re dealing with cutoffs you’ve got to put the cutoff somewhere, but turning something which by its very nature deals with uncertainty into a set of rigid rules about what’s significant and what’s not seems pretty stupid.

My distaste for frequentist methods means that even for simple linear regressions I’ll fire up JAGS in R and fit a Bayesian model because I fundamentally disagree with the idea of an unknown but fixed true parameter. Further to this, the nuances of p values being distributed uniformly under the Null hypothesis means that we can very quickly make incorrect statements.

I agree with the author of the article that shifting hypothesis testing p value goal posts won’t achieve what we want and I’ll have a bit closer a read of the paper. For the time being, I’ll continue to just mull this over and grumble when people say “statistically significant” without any reference to a significance level.

NB: this post has been in an unfinished state since last November, when the paper started getting media coverage.

Lotka-Volterra and Bayesian statistics and teaching

One of the standard population dynamics models that I learned in my undergrad mathematical modelling units was the Lotka-Volterra equations. These represent a very simple set of assumptions about populations, and while they don’t necessarily give physically realistic population trajectories they’re an interesting introduction to the idea that differential equations systems don’t necessarily have an explicit solution.

The assumptions are essentially: prey grow exponentially in the absence of predators, predation happens at a rate proportional to the product of the predator and prey populations, birth of predators is dependent on the product of predator and prey populations, predators die off exponentially in the absence of prey. In SEB113 we cover non-linear regressions, the mathematical models that lead to them, and then show that mathematical models don’t always yield a nice function. We look at equilibrium solutions and then show that we orbit around it rather than tending towards (or away from) it. We also look at what happens to the trajectories as we change the relative size of the rate parameters.

Last time we did the topic, I posted about using the logistic growth model for our Problem Solving Task and it was pointed out to me that the model has a closed form solution, so we don’t explicitly need to use a numerical solution method. This time around I’ve been playing with using Euler’s method inside JAGS to fit the Lotka-Volterra system to some simulated data from sinusoidal functions (with the same period). I’ve put a bit more effort into the predictive side of the model, though. After obtaining posterior distributions for the parameters (and initial values) I generate simulations with lsode in R, where the parameter values are sampled from the posteriors. The figure below shows the median and 95% CI for the posterior predictive populations as well as points showing the simulated data.

lvThe predictions get more variable as time goes on, as the uncertainty in the parameter values changes the period of the cycles that the Lotka-Volterra system exhibits. This reminds me of a chat I was having with a statistics PhD student earlier this week about sensitivity of models to data. The student’s context is clustering of data using overfitted mixtures, but I ended up digressing and talking about Edward Lorenz’s discovery of chaos theory through a meteorological model that was very sensitive to small changes in parameter values. The variability in the parameter values in the posterior give rise to the same behaviour, as both Lorenz’s work and my little example in JAGS involve variation in input values for deterministic modelling. Mine was deliberate, though, so isn’t as exciting or groundbreaking a discovery as Lorenz but we both come to the same conclusion: forecasting is of limited use when your model is sensitive to small variations in parameters. As time goes on, my credible intervals will likely end up being centred on the equilibrium solution and the uncertainty in the period of the solution (due to changing coefficient ratios) will result in very wide credible intervals.

It’s been a fun little experiment again, and I’m getting more and more interested in combining statistics and differential equations, as it’s a blend of pretty much all of my prior study. The next step would be to use something like MATLAB with a custom Gibbs/Metropolis-Hastings scheme to bring in more of the computational mathematics I took. It’d be interesting to see if there’s space for this sort of modelling in the Mathematical Sciences School’s teaching programs as it combines some topics that aren’t typically taught together. I’ve heard murmurings of further computational statistics classes but haven’t been involved with any planning.

Running Bayesian models

I came across a post via r/Bayes about different ways to run Bayesian hierarchical linear models in R, a topic I talked about recently at a two day workshop on using R for epidemiology. Rasmus Bååth‘s post details the use of JAGS with rjags, STAN with rstan and LaplacesDemon.

JAGS (well, rjags) has been the staple for most of my hierarchical linear modelling needs over the last few years. It runs within R easily, is written in C++ (so is relatively fast), spits out something that the coda package can work with quite easily, and, above all, makes it very easy to specify models and priors. Using JAGS means never having to derive a Gibbs sampler or write out a Metropolis-Hastings algorithm that requires to you to really think about jumping rules. It’s Bayesian statistics for those who don’t have the time/inclination to do it “properly”. It has a few drawbacks, though, such as not being able to specify improper priors (but this could be seen as a feature rather than a bug) with distributions like dflat() and defining a Conditional Autoregressive prior requires specifying it as a multivariate Gaussian. That said, it’s far quicker than using OpenBUGS and JAGS installs fine on any platform. Continue reading