Some people make their visualisations in Excel, I make mine in R and others still use things like Processing or InDesign. Bret Victor shows us how the various ideas from each approach can be combined to make dynamic visualisations.
I’ve picked up a hobby over the last few months that is paying delicious dividends: homebrewing. It’s something I’d been wanting to try since about this time last year and I finally dropped the money (a cooking store voucher) on a cider homebrewing kit in February. My first batch was an apple cider that came with the kit and it’s been improving with age since the first bottle was opened in late February/early March. The second batch was a pear cider that a friend asked me to make for her; it was divided into two batches after primary fermentation so that I could try something different with the “excess”. The resulting pear and berry cider will make its debut quite soon, as it’s been patiently settling and aging over the last three weeks or so.
While I haven’t been keeping time series of the specific gravity, temperature and colour of the cider as it brews, there is certainly grounds to do so. Brewing and statistics have a history which goes back at least as far as William Sealy Gosset, who developed the t-distribution (and test) under the name “Student” while working at the Guinness brewery in 1908. Brewing involves balancing complex ecosystems of a whole lot of different things (depending on what you’re making) and is essentially a giant biochemical experiment. To get properly into brewing requires an understanding of botany, chemistry, microbiology, physics and statistics as you attempt to turn your basic ingredients into something which is tasty, non-toxic and perhaps even effervescent. I would like to start brewing beer at home soon, which will no doubt lead to me reading a lot more about hops, malt, wort, grains and yeasts and taking more fastidious notes.
So my exposure to microbiology has been twofold over the last year; working with a Finnish colleague on papers dealing with fungus and endotoxin counts in the UPTECH project and brewing my own alcoholic cider at home. The main fungus paper has been submitted and we’re checking the modelling on the endotoxin paper so that it can be submitted before this colleague leaves in the next few days. I can’t think of a more fitting thing to bring to her farewell party than a drinkable microbiology experiment.
Some advice for my SEB113 students who may struggle with the workload of first semester university comes from the sagest of equines, @horse_ebooks.Hugh Possingham‘s talking on Monday about the mathematics and economics of conservation as part of the BrisScience seminar series. I’ve been meaning to make it to a Possingham talk for a while.
Meta-analysis with a covariate feels really weird. I’m wanting to compare the relationship between the distributions of the mean concentration of endotoxin in the air and in dust samples across 50 locations. I wasn’t sure I did it the right way but the posterior estimates are consistent with my naïve approach of regressing the means of the air and dust samples. It’s important to account for the variability when doing this sort of post hoc analysis because a point estimate of the mean doesn’t reflect anywhere near the full set of knowledge you have about your parameters of interest.
On an unrelated note, another UPTECH paper has been published. This one’s looking at spatial variation of particle number concentration in the school environment. Congratulations to Farhad Salimi, the first author of this paper, on the publication of his first paper. Farhad’s one of the PhD students on the UPTECH project and is due to finish his thesis later this year. I’ve worked with him on two of his papers (this one and another which has been submitted) and he’s really thrown himself into learning how to use R. This has not only made it easier for me to collaborate with him but it’s also made his analysis possible.
In Australia, at least, the impact factor of the journals you publish in plays a large role in your advance in academia. Universities are always under pressure to publish their research in more prestigious journals, conflating the impact factor of the journal and the impact of the research published in it. There are many ways journals can game their impact factor, many ways researchers can game the indices that describe the impact of their work, etc. That said, it’s always good to aim to produce research that will be accepted in a high quality journal.
I’ve been excited about the PLoS journals since their launch and I believe QUT is a subscribing member, which means our publication fees are covered. It’s one of the best Open Access journal groups around and doesn’t appear to be a cash grab like some other publishers who are attempting to use Open Access as a business model to increase profits rather than because they believe in the free dissemination of research.
UPTECH collected fungi and endotoxin data at the 25 schools, and we’re about to submit the fungi paper (which means work must continue on the endotoxin paper). I was considering whether we should submit to PLoS One (IF 2011: 4.092) and then had a look at what other journals they have which may be an appropriate home. I really think once we get the clinical data from our Southern collaborators we should aim to do the best statistical modelling we can. I’m heartened by the fact that the head of the clinical group we’re working with has a strong background in stats and a desire to learn more Bayesian statistics. I don’t know if we can pull it off, but the prospect of having something investigating the role of fungi and endotoxin on child health published in PLoS Pathogens (IF 2011: 9.172) is exhilarating.
There are things I’ve heard of and never followed up like Expectation Maximisation (and Variational Bayes, for that matter), Expectation Propagation and Hamiltonian Monte Carlo. Things I once learned about and forgot because I didn’t have the background at the time such as Importance Sampling, Rejection Sampling, Slice Sampling. Then there’s things that are the cutting edge of statistical research that aren’t necessarily statistical methods but means of implementing them and are transforming the way we do statistics, such as CUDA.
I’ve managed to pick up a few little statistical novelties along the way such as nonparametric Bayes, hierarchical linear models and some of the theory behind GMRFs and Gaussian Processes but I feel like I’m lagging behind where I want to be. This could be a consequence of being based in a group with no other statisticians. Were I doing my postdoc in a statistics group I’d be more deeply immersed in a group with a culture of doing statistics research rather than doing scientific research which requires statistics I already know.