Roger Peng posted at Simply Statistics about what it is to do statistical research and how research is essentially solving problems that can’t be solved with the current methods. The message I took from Peng’s post is that often you can 90% solve a problem with current methods and that a lot of the time this is “good enough” and you can come back to the problem later with some new approaches that go beyond the current methods.

As part of the UPTECH project I’ve been doing a lot of work with Bayesian hierarchical linear models. While our data has been collected from a panel design (25 schools, two weeks at each) it’s not always appropriate to use a full-blown spatial model. For example, the microbiological work I’ve been doing with my Finnish collaborator is mostly solvable by using exchangeable means priors to estimate classroom level and school level effects. Recently I’ve also had to start looking at clustering techniques, meta-analysis, spatial modelling of high-resolution data, estimating personal exposure, large surveys, and many other applied science problems that require a novel statistical approach.

This sort of collaboration/consulting work, according to Terry Speed (whose post Peng is discussing), is a chance to meet lots of people and work on some interesting problems. For me, it has involved learning about existing techniques and trying to figure out how my collaborators and I can apply them to our data to do the best inference we can. With the UPTECH work, there’s always going to be a large list of authors due to the size of the project and the number of people involved in collecting data. Authorship will always be an issue with our papers, both in terms of inclusion and ordering, and we’ve got a decent process in place which makes people aware of papers as they’re finishing up (but not yet ready for submission). My personal belief is that one should always be able to point to a published paper and say “I did that”.

Collaboration in applied physics and chemistry seems to be a very different beast to collaboration in statistics and mathematics. Many of the postgraduate students I know in Mathematics have tended to write methods papers with their supervisor(s) and that’s it. There’s the occasional collaboration to apply the method to a problem, but unless you’re working on cross-disciplinary modelling work or a large project involving numerical simulation there doesn’t appear to be much scope for multi-author work. Look back at some of the foundational statistical papers and you’ll see they’ve been written by a single author (some (non-parametric) Bayesian foundations spring to mind: de Finetti, 1937; Kingman, 1967; and Ferguson, 1973). The question of when to collaborate, with whom, and what it will add is part and parcel of modern science but there are some fields where collaboration is rare and keeping the author list short can lead to problems.

Statistical research is necessary when there’s a problem to be solved that is 0% solvable with the current methods. Some of what I’m doing is novel, within the context of aerosol science, but I haven’t done as much stats research in my postdoc as in my PhD. This is no doubt a result of my doing as much collaboration as I am. I get to work on a lot of problems but there’s not much original statistical work in these papers; if I’m lucky I get to do some of the “10%” research.

It’s hard to do statistical research in a physics group, especially as the only statistician here. I think if we had a second statistician in the group there’d be a lot more statistics being done both in terms of collaboration/consultation with the scientists and the methods we use to solve problems. The “Airports of the Future” project has quite a number of statisticians working on, among other things, Bayesian Networks, and they’re extending the BN methodology as well as applying it to a novel problem. Two of the members of this team gave a talk at BRAG this morning about visualisation of BN results. This is something I’ll no doubt need to learn about sooner or later as we plan on using BNs with another project that ILAQH is putting together.

Four and a half years ago I was under the impression I was joining a physics group to do computational fluid dynamics. Since then I have been learning statistics almost constantly. It’s opened up many more opportunities for collaboration than CFD would have. The trick for me now is to try and put myself in a position where I’m working with other statisticians on statistics. We’ve got some work coming up soon with a more senior statistician at IHBI, which I hope will bring with it some opportunities for more statistical methodology work.

Unrelated PS: The 3rd edition of Gelman’s Bayesian Data Analysis is being released soon, with contributions from David Dunson and Aki Vehtari.