Tag Archives: collaboration

Two pieces of good news this week

The full paper from the EMAC2013 conference last year is now available online. If you’re interested in the statistical methodology we used for estimating the inhaled dose of particles by students in the UPTECH project, you should check out our paper at the ANZIAM Journal (click the link that says “PDF” down the bottom under Full Text).

More importantly, though, we were successful in applying for an ARC Discovery Project! This project will run for three years and combines spatio-temporal statistical modelling, sensor miniaturisation and mobile phone technologies to allow people to minimise their exposure to air pollution. Our summary of the project, from the list of successful projects:

This interdisciplinary project aims to develop a personalised air pollution exposure monitoring system, leveraging the ubiquitousness and advancements in mobile phone technology and state of the art miniaturisation of monitoring sensors, data transmission and analysis. Airborne pollution is one of the top contemporary risks faced by humans; however, at present individuals have no way to recognise that they are at risk or need to protect themselves. It is expected that the outcome will empower individuals to control and minimise their own exposures. This is expected to lead to significant national socioeconomic benefits and bring global advancement in acquiring and utilising environmental information.

Other people at ILAQH were also successful in getting a Discovery Project grant looking at new particle formation and cloud formation in the Great Barrier Reef. I won’t be involved in that project but it sounds fascinating.


Stop, collaborate and listen

Roger Peng posted at Simply Statistics about what it is to do statistical research and how research is essentially solving problems that can’t be solved with the current methods. The message I took from Peng’s post is that often you can 90% solve a problem with current methods and that a lot of the time this is “good enough” and you can come back to the problem later with some new approaches that go beyond the current methods.

As part of the UPTECH project I’ve been doing a lot of work with Bayesian hierarchical linear models. While our data has been collected from a panel design (25 schools, two weeks at each) it’s not always appropriate to use a full-blown spatial model. For example, the microbiological work I’ve been doing with my Finnish collaborator is mostly solvable by using exchangeable means priors to estimate classroom level and school level effects. Recently I’ve also had to start looking at clustering techniques, meta-analysis, spatial modelling of high-resolution data, estimating personal exposure, large surveys, and many other applied science problems that require a novel statistical approach.

This sort of collaboration/consulting work, according to Terry Speed (whose post Peng is discussing), is a chance to meet lots of people and work on some interesting problems. For me, it has involved learning about existing techniques and trying to figure out how my collaborators and I can apply them to our data to do the best inference we can. With the UPTECH work, there’s always going to be a large list of authors due to the size of the project and the number of people involved in collecting data. Authorship will always be an issue with our papers, both in terms of inclusion and ordering, and we’ve got a decent process in place which makes people aware of papers as they’re finishing up (but not yet ready for submission). My personal belief is that one should always be able to point to a published paper and say “I did that”.

Collaboration in applied physics and chemistry seems to be a very different beast to collaboration in statistics and mathematics. Many of the postgraduate students I know in Mathematics have tended to write methods papers with their supervisor(s) and that’s it. There’s the occasional collaboration to apply the method to a problem, but unless you’re working on cross-disciplinary modelling work or a large project involving numerical simulation there doesn’t appear to be much scope for multi-author work. Look back at some of the foundational statistical papers and you’ll see they’ve been written by a single author (some (non-parametric) Bayesian foundations spring to mind: de Finetti, 1937Kingman, 1967; and Ferguson, 1973). The question of when to collaborate, with whom, and what it will add is part and parcel of modern science but there are some fields where collaboration is rare and keeping the author list short can lead to problems.

Statistical research is necessary when there’s a problem to be solved that is 0% solvable with the current methods. Some of what I’m doing is novel, within the context of aerosol science, but I haven’t done as much stats research in my postdoc as in my PhD. This is no doubt a result of my doing as much collaboration as I am. I get to work on a lot of problems but there’s not much original statistical work in these papers; if I’m lucky I get to do some of the “10%” research.

It’s hard to do statistical research in a physics group, especially as the only statistician here. I think if we had a second statistician in the group there’d be a lot more statistics being done both in terms of collaboration/consultation with the scientists and the methods we use to solve problems. The “Airports of the Future” project has quite a number of statisticians working on, among other things, Bayesian Networks, and they’re extending the BN methodology as well as applying it to a novel problem. Two of the members of this team gave a talk at BRAG this morning about visualisation of BN results. This is something I’ll no doubt need to learn about sooner or later as we plan on using BNs with another project that ILAQH is putting together.

Four and a half years ago I was under the impression I was joining a physics group to do computational fluid dynamics. Since then I have been learning statistics almost constantly. It’s opened up many more opportunities for collaboration than CFD would have. The trick for me now is to try and put myself in a position where I’m working with other statisticians on statistics. We’ve got some work coming up soon with a more senior statistician at IHBI, which I hope will bring with it some opportunities for more statistical methodology work.

Unrelated PS: The 3rd edition of Gelman’s Bayesian Data Analysis is being released soon, with contributions from David Dunson and Aki Vehtari.

Ad hoc collaboration

Rbloggers have announced the launch of RPubs, a free service which makes it easy to publish code and analysis on the web. It’s based on RStudio and the markdown package and looks like a great way for people to show analysis to co-workers who might not have R on their computers when you don’t feel like writing a report. I really like this idea and might end up using it in my office to show what we can do with statistics.

Another thing I’ve been thinking about is the potential to use an Apple TV and its screen sharing capabilities to do presentation work from iPhones, iPads and Mac computers. A lot of people in my office have iPhones, so an Apple TV hooked up to a HDMI screen (surely universities just leave these lying around) might be a good way to get a group of people to take some notes or share prepared slides with a small room of people. For example, if people had a PDF version of slides on their iPhones they could take control of the Apple TV and use their iPhone to flick through the slides, allowing everyone to stay in their seats and control the slideshow from their own device.

I was excited by Google Wave when it first launched, as it combined a lot of what I liked about Gmail, Google Docs and Google Chat with an extension system, making it an incredibly powerful and flexible platform for collaborative work. Unfortunately it was released prematurely and died off after a flurry of uptake. Google Plus doesn’t really make up for it, either. I really liked the idea of collaboratively writing a document and being able to add in a voting gadget to resolve whether a section should be included. I used it socially to determine the dates of picnics with friends, which was probably where most of my use was directed.

Probably the best example of how useful I found it was in writing a manual of procedures with about ten other volunteers who would ask questions. As we answered the questions, we folded the answers into the main part of the document. This was much more useful than writing a static document and then having a separate email list for discussion, or using track changes in Microsoft Word to email around a huge document that kept on growing.

I have high hopes for the internet in terms of ad hoc collaboration, particularly academic collaboration. I find GitHub really exciting because it allows me to work on a private project and then add a collaborator when they come on board. Once a project is finished and the paper published, that private repository can be made public and anyone can fork it and do with it what they will. If they’re intrigued by what’s been done, they could contact me and discuss what they’ve done and we can build a new project based on their fork of my work. With so much of my work being based on R or MATLAB and written up in LaTeX, I find this potential way of working quite sensible. Add in the fact that GitHub gives you a wiki system for each project and you’ve got a great tool at your disposal.

A somewhat ad hoc collaborative tool that I organised is the wiki for QUT’s Bayesian Non-parametrics reading group. This is a repository for the collective work of the group, including Q&A on the papers we’re reading, notes from the meetings, a list of papers read, code chunks, links to videos explaining what we’re working on, etc. It’s been a really useful tool and I’d hope that others interested in the same work could use it as a resource for their own learning.

There’s a lot of really cool stuff out there. It’s a matter of finding useful tools that don’t have particularly high barriers to entry and allow non-experts to view expertly produced material (like on RPubs). The longer it’s been since one was a student, the less likely one seems to be to adopt new workflow practices. I’ve suggested git to my supervisors as a good way for our groups to work but I have a feeling that none of them are interested enough in distributed version control to put the effort in to learning how to use them. So for now it’s annotated PDFs or printed pages with scribbles on them rather than making the edits to a LaTeX source file and committing and pushing their changes to a shared repository.