I’ve spent the last three days at the Bayes on the Beach conference/workshop in Caloundra. The meeting is an annual event where Bayesian statisticians (usually Australians who know Kerrie Mengersen) get together for a very casual set of talks, workshops and other activities in a beachside town.
We arrived in Caloundra on Tuesday morning and got down to business with a keynote talk from Dave Woods (University of Southampton). Professor Woods’ talk opened a day in which the over-arching theme was experimental design and made the point that any design is at least implicitly Bayesian. I really agree with this point because any design is based on the prior experience of the experimenter or is pieced together from a review of what’s been done previously. I don’t know of any situation in which an experimental study was carried out totally at random without choosing appropriate covariate values. I don’t have much of a head for design but I think after seeing the talks on Tuesday (particularly Liz Ryan’s, which others in her session praised as giving a good review of utility) I’m a bit more aware of what it all means. What really helped was Stephen Duffull (University of Otago) in his Thursday talk where he spelled out D-optimality (wanting to maximise the effect) and P-optimality (wanting to get the best estimate of parameters). D- and P-optimality work in opposite directions quite frequently (killing all your patients doesn’t tell you much about the parameters in your model, for example) but it’s possible to trade them off with a DP-optimal design.
One of my favourite aspects of the Bayes on the Beach meetings is the workshops. At my first Bayes on the Beach (2009) we were shown how to do a Bayesian meta-analysis to combine the results of a bunch of disparate studies that dealt with the same topic. Ever since then, I’ve been looking forward to learning some more statistics or getting to grips with interesting data. Matt Wand (University of Technology Sydney), Julian Caley (Australian Institute of Marine Science) and Sama Low Choy (Queensland University of Technology, my associate supervisor) each pitched a problem that people could have a look at, not necessarily solving but, working towards a solution. Wand presented some really interesting spatio-temporal data of extreme rainfall in NSW which I was very keen to sign up to but Kerrie suggested that it might be a bit more of a challenge to do something other than spatio-temporal modelling of environmental data. Sama presented some work that she’s been doing on combining elicited opinions into a subjective prior that represents the aggregate knowledge of experts. I ended up in Caley’s group, where we worked on some data that Julie Vercelloni is dealing with as part of her PhD.
Caley, Vercelloni and Mengersen (among others) are working on coral coverage in the Great Barrier Reef in six sectors that run up the length of the reef (2600km). Within each sector there are reef shelves and on each shelf there are multiple reefs with multiple measurement patches. The question we attempted to answer was “How different are the long term trends in coral coverage at these reefs?” Caley has data going back to about 1994, reported annually, which when pooled looked very boring but when plotted (very well by Vercelloni) grouped by reef shelf within sector indicated that there might be quite a lot of interesting variation which may not be so straight forward to model. Our group split up into a few subgroups with different approaches and I ended up working with James McKeone and a few others on a model inspired by Cari Kaufmann’s functional ANOVA with GP priors. Over the next day or so McKeone took what we’d discussed and written down as a model and came up with quite a general Gibbs sampling scheme that is flexible enough to admit any linear predictor. I’m fairly certain James and I both had P-splines in mind but I did talk later to Matt Wand about O’Sullivan splines and I think it might be conceptually easier to use Wand’s low rank thin plate smoothers, particularly as Vercelloni’s quite new to Bayesian statistics and has a background in ecology rather than computational statistics.
It was interesting to see how the other workshops went on the Thursday afternoon recap. Sama’s group had split into three groups, each tackling the issue of elicitation with a different topic and a different angle. I must say that my favourite was the group who did an elicitation of predictions of the outcome of the US Presidential Election. Luisa Hall even managed to elicit my opinion for her survey without me even realising it (we talk a lot about politics)! The other groups asked about how risky a life-saving operation would have to be for them to not take it and average completion time for PhD students (which Sama gave a talk about on Thursday).
Real time updates for Mean Field Variational Bayes
Matt Wand also gave a talk and tutorial about using Mean Field Variational Bayes (a name he attributes to Mike Jordan) to do live, real-time updates of posterior estimates with streamed data such as stock trading. Rather than going into the content of the talk, I suggest you read the paper he’s written with Tamara Broderick (University of California, Berkeley) and his PhD student Jan Luts (University of Technology Sydney) and check out their website with neat examples, e.g. the Sydney rental market (which I think would be fascinating with the train lines superimposed).
Games and Posters
Tuesday and Wednesday evenings had Luisa and I running some games (Dictionary and Werewolf) before the poster sessions. It’s interesting running games like this with Bayesians because Dictionary is basically a problem of credibility of unknown experts and Werewolf is all about updating an initially uninformative prior with information based on peoples’ behaviour as they accuse others while trying to avoid being lynched. The poster sessions themselves were quite good, with a wide variety of applications and methodologies being presented. Everyone seemed quite keen to talk about their posters and they were generally of a high quality.
On Thursday afternoon I gave my talk about spatio-temporal modelling of the UPTECH data as a case study for INLA. I got a few questions about the versatility of INLA and my choice of random walk and spline models instead of other bases as well as comments about the spatial modelling I’m doing. Definitely some things to think about for the next papers I write.
The organisers of Bayes on the Beach (Nicole White, Matt Moores, Jannah Baker and Dow Jaemjamrat) all did a really good job and I think everyone had a good time but was also inspired. I had a few minor issues (timekeeping is a perpetual bugbear of mine and I’m yet to go to a conference that runs totally on schedule) but think that the meeting did a really good job of bringing together a disparate group of researchers and introducing them to each other and providing ideas for future work and employment opportunities (Kim-Anh Do did a really good job of selling the MD Anderson Cancer Center). I think the challenge for future Bayes on the Beach meetings will be managing the growth in the number of attendees. I look forward to next year’s meeting; I might even have some time to go to the beach!