Maturing from student to researcher

The other week, when I was in Sydney, I caught up a friend who’s moved down there and is working in a similar role to me (albeit with a much larger group). He’s got a similar background to me; we both studied mathematics at QUT and focussed on computational and applied mathematics units but we now find ourselves working in (bio-)statistics*. I stayed in academia when he went off to work in industry but he has earned a Masters in computational statistics and has picked up Bayesian stats.

We both learned Bayesian stats through Gelman, Carlin, Stern and Rubin’s “Bayesian Data Analysis“, a book which is known to the Bayesian PhD students at QUT as “The Bible”; it’s been used by just about every lecturer that has taught the Honours level Bayesian Data Analysis class. In addition to The Bible, other Bayesian resources I’ve leaned on over the last few years are Gelman and Hill’s book on hierarchical models and Gelman’s blog. My friend and I got talking about Gelman’s work and how of late we seem to be disagreeing with some of the choices he makes in modelling. For my part, I don’t agree with (or is it understand?) the decisions in Gelman’s Bayesian approach to ANOVA (focussing more on the variance parameters than the means) and the particular parameterisation of the global variance parameter when he discusses the use of a folded non-central t distribution.

Now, it’s not that I think Gelman is wrong where he was previously right or that he’s losing the plot (after all, these papers are years old), but as I read his blog about the models he’s fitting now I’m coming to the realisation that I had been following what he’d been saying and am now looking elsewhere and seeing other ways of doing things. There are many different approaches that each have their strengths and weaknesses and philosophical (and practical) idiosyncrasies. One of the strengths of the Bayesian approach is that the incorporation of priors in the modelling approach gives you a very flexible class of models (hierarchical Bayesian modelling is one of the most useful tools I’ve picked up) and allows you a great amount of freedom in choosing how to build your priors. There is no one correct prior for each problem§; you can use a Jeffreys’ prior if you really want to go down the path of non-informativity or if you’re content with (and can justify) a weakly informative Normal(0, 1e-6) or Gamma(0.001, 0.001). Sometimes you can even choose an appropriately flat prior that results in the posteriors of your parameters having the same distribution as the frequentist approach (where the 95% confidence interval and credible intervals have the same values, but not the same interpretations of course). Sometimes it’s appropriate to elicit a prior from experts or the literature and go for a very informative prior if you don’t have much data in your experiment/observation^.

There are lots of different ways to do things, lots of papers pushing different approaches. As a student you tend to look up to people as paragons of the field and go “Well if Gelman did it that way then I’d better do that too”; after four years of study I feel more comfortable looking at something and saying “No, I disagree”. I may not always be doing it the best way possible but I’ll always try to justify what I’ve done both to my collaborators and to the editor/reader of my papers. If it turns out I’ve done something wrong, so be it; I can always try again and learn from the experience.

* I’m yet to hear a satisfactory explanation as to what the difference is between a biostatistician and a statistician.

§ It’s worth checking out some of the ideas of so-called Objective Bayes if subjectivity is something you’re concerned about.

^ Whatever you do, check your sensitivity to your choice of priors.

Applying for jobs

I just sent in my application for QUT’s ECARD (Early Career Academic Recruitment and Development) program. The program is an ongoing attempt by QUT to recruit young researchers and train them as lecturers and academics who will sustain the university as the more senior academics retire. It’s not just a staff hiring program, though, as it requires that those recruited into the program undertake a postgraduate course in tertiary education and engage themselves with the administrative aspects of the university as well.

This round there are 11 positions in the Science and Engineering Faculty, two of which are with maths/stats. I’m hoping to get selected for one of these two positions. I’ve spent the last 4.5 years with an aerosol science research group learning how to be a statistician. It’s been a bit of a weird experience and reminds me of Professor Ian Turner (my Linear Algebra and Computational Mathematics lecturer) who has a PhD in Engineering yet does research into numerical algorithms for solving  matrix systems and their mathematical properties.

The prospect of moving back to the School of Mathematics doesn’t feel like I’d be going “home” so much as finding a new group of people to work with. I did my undergrad and honours in mathematics but my primary Honours supervisor has moved to the School of Public Health and most of my cohort have either moved on to other universities or gone into industry. The people, location, organisational structure, units, etc. have all changed since I left. Despite all these changes it’s still the QUT School of Mathematical Sciences and there are other Bayesian statisticians over there, something I can’t really say for the School of Chemistry, Physics and Mechanical Engineering (where I am now).

Of course, I could remain employed in an applied sciences group and still collaborate with other statisticians (as I do now) but there’s a world of difference between my little office in M block with the other ILAQH members and a quarter of a floor full of mathematicians and statisticians in the new Science and Engineering Centre. For all my love of using Git and LaTeX to collaboratively write papers with people half a world away, you really can’t beat proximity to others like you to continually inspire and challenge you.

There are a few other jobs that I’m looking at, both in Brisbane and overseas. I got some very good advice from a friend of mine who’s currently working in Northern Europe; you should aim to find a job that:

  1. is interesting
  2. has good people
  3. is in a good place.

The first is clearly the most important in terms of job satisfaction. Working on a project you don’t care about (but can competently do) sounds soul-destroying. Working with people you don’t get along with and who don’t value what you do will lead to stress, conflict, etc. and working in a place which has nothing going for it will prevent you from being able to enjoy your time away from work.

I think an ECARD position within the School of Mathematical Sciences would fit the all three of these, particularly if I end up lecturing one of the new ST01 units and have some discretion over the sort of research I do.

Statistics and microbiology

I’ve picked up a hobby over the last few months that is paying delicious dividends: homebrewing. It’s something I’d been wanting to try since about this time last year and I finally dropped the money (a cooking store voucher) on a cider homebrewing kit in February. My first batch was an apple cider that came with the kit and it’s been improving with age since the first bottle was opened in late February/early March. The second batch was a pear cider that a friend asked me to make for her; it was divided into two batches after primary fermentation so that I could try something different with the “excess”. The resulting pear and berry cider will make its debut quite soon, as it’s been patiently settling and aging over the last three weeks or so.

While I haven’t been keeping time series of the specific gravity, temperature and colour of the cider as it brews, there is certainly grounds to do so. Brewing and statistics have a history which goes back at least as far as William Sealy Gosset, who developed the t-distribution (and test) under the name “Student” while working at the Guinness brewery in 1908. Brewing involves balancing complex ecosystems of a whole lot of different things (depending on what you’re making) and is essentially a giant biochemical experiment. To get properly into brewing requires an understanding of botany, chemistry, microbiology, physics and statistics as you attempt to turn your basic ingredients into something which is tasty, non-toxic and perhaps even effervescent. I would like to start brewing beer at home soon, which will no doubt lead to me reading a lot more about hops, malt, wort, grains and yeasts and taking more fastidious notes.

So my exposure to microbiology has been twofold over the last year; working with a Finnish colleague on papers dealing with fungus and endotoxin counts in the UPTECH project and brewing my own alcoholic cider at home. The main fungus paper has been submitted and we’re checking the modelling on the endotoxin paper so that it can be submitted before this colleague leaves in the next few days. I can’t think of a more fitting thing to bring to her farewell party than a drinkable microbiology experiment.

Bonus link: Homebrewing redditor who works in a microbiology lab discovers a new strain of fungus which produces the best beer he’s ever homebrewed.

 

 

 

Meta-analysis? Meta-regression?

Meta-analysis with a covariate feels really weird. I’m wanting to compare the relationship between the distributions of the mean concentration of endotoxin in the air and in dust samples across 50 locations. I wasn’t sure I did it the right way but the posterior estimates are consistent with my naïve approach of regressing the means of the air and dust samples. It’s important to account for the variability when doing this sort of post hoc analysis because a point estimate of the mean doesn’t reflect anywhere near the full set of knowledge you have about your parameters of interest.

On an unrelated note, another UPTECH paper has been published. This one’s looking at spatial variation of particle number concentration in the school environment. Congratulations to Farhad Salimi, the first author of this paper, on the publication of his first paper. Farhad’s one of the PhD students on the UPTECH project and is due to finish his thesis later this year. I’ve worked with him on two of his papers (this one and another which has been submitted) and he’s really thrown himself into learning how to use R. This has not only made it easier for me to collaborate with him but it’s also made his analysis possible.

I know the impact factor’s not the be all and end all, but…

In Australia, at least, the impact factor of the journals you publish in plays a large role in your advance in academia. Universities are always under pressure to publish their research in more prestigious journals, conflating the impact factor of the journal and the impact of the research published in it. There are many ways journals can game their impact factor, many ways researchers can game the indices that describe the impact of their work, etc. That said, it’s always good to aim to produce research that will be accepted in a high quality journal.

I’ve been excited about the PLoS journals since their launch and I believe QUT is a subscribing member, which means our publication fees are covered. It’s one of the best Open Access journal groups around and doesn’t appear to be a cash grab like some other publishers who are attempting to use Open Access as a business model to increase profits rather than because they believe in the free dissemination of research.

UPTECH collected fungi and endotoxin data at the 25 schools, and we’re about to submit the fungi paper (which means work must continue on the endotoxin paper). I was considering whether we should submit to PLoS One (IF 2011: 4.092) and then had a look at what other journals they have which may be an appropriate home. I really think once we get the clinical data from our Southern collaborators we should aim to do the best statistical modelling we can. I’m heartened by the fact that the head of the clinical group we’re working with has a strong background in stats and a desire to learn more Bayesian statistics. I don’t know if we can pull it off, but the prospect of having something investigating the role of fungi and endotoxin on child health published in PLoS Pathogens (IF 2011: 9.172) is exhilarating.

There is so much I don’t know that I wish I did

There are things I’ve heard of and never followed up like Expectation Maximisation (and Variational Bayes, for that matter), Expectation Propagation and Hamiltonian Monte Carlo. Things I once learned about and forgot because I didn’t have the background at the time such as Importance Sampling, Rejection Sampling, Slice Sampling. Then there’s things that are the cutting edge of statistical research that aren’t necessarily statistical methods but means of implementing them and are transforming the way we do statistics, such as CUDA.

I’ve managed to pick up a few little statistical novelties along the way such as nonparametric Bayes, hierarchical linear models and some of the theory behind GMRFs and Gaussian Processes but I feel like I’m lagging behind where I want to be. This could be a consequence of being based in a group with no other statisticians. Were I doing my postdoc in a statistics group I’d be more deeply immersed in a group with a culture of doing statistics research rather than doing scientific research which requires statistics I already know.

CAR 2013 PhD scholarship information

CAR, the body who funds my postdoc, are advertising two full time PhD scholarships for students who are interested in researching air quality. There are CAR investigators who will serve as supervisors in Brisbane, Sydney, Melbourne, Wollongong and Hobart.

Further information can be found in the three documents below.

New INLA stuff makes me happy

R-INLA is a really neat use of GMRFs for computing posteriors for quite complicated Bayesian Latent Gaussian Models. I used it for spatio-temporal modelling in my PhD and had to feel my way through a lot based on an old demo which was purely spatial.

As I got further and further into my PhD I saw extensions for R-INLA being written thanks to a few visits from, and email correspondence with, Dr Daniel Simpson, and the help list on the R-INLA site where Dan, Håvard Rue and Finn Lindgren are very quick with a reply.

A few days ago I got an email from Rue telling me he’d been made aware of one of my thesis papers and if I wouldn’t mind having a look at running it with the new testing distribution of R-INLA. It’s the first time I’ve looked at the code again since submitting the paper for publication and it seems that an awful lot of work has been put into internal optimisation. The code for running my model requires less manual tuning now and I’m excited about using it in follow-up papers where I’ll be looking at more of the UPTECH data.

There’s also a new tutorial for spatial modelling with INLA, written by Elias T. Krainski, which covers a number of topics such as a simple spatial regression, a spatial model with misalignment and non-stationary spatial models (which I’ve seen talked about a few times but there’s very little documentation about them).

I think R-INLA, particularly the spatial modelling, has really come a long way over the last few years and it’s encouraging to see it being taken up at QUT where students would probably have used WinBUGS in the past. While there are some limitations in terms of the flexibility of the classes of models that can be fit in BUGS versus R-INLA I’d much rather do any spatial, spatio-temporal or non-parametric smoothing in R-INLA.

New things in Science and Engineering at QUT

Today was the first day of O week at QUT, a time when the relative calm of the summer break is disturbed by an influx of 17 year olds and university-run activities that always seem to generate a lot of noise. Is it possible to be a grumpy old man a week shy of 29?

I received an email from my supervisor this morning asking if I could take over from one of the other PhD students in our group who had fallen ill last week and not recovered in time for a presentation this morning. The presentation, scheduled for 9am, was to be the first of the inaugural Nanotechnology and Molecular Science HDR (Higher Degree Research, i.e. Doctoral and Masters students) symposium.

I’ve been moaning quietly, since starting my PhD in the School of Physical and Chemical Sciences, that the physics discipline had nothing like the School of Mathematics’ Postgrad Day. I really like Postgrad Day as it’s a good way to see what the other postgrad students are working on, what the research foci are within the school, and for students to improve their public speaking skills by delivering their research to a room of their peers and the other researchers in the school in an environment which is much more supportive than any conference is likely to be.

The NMS HDR symposium brought together a number of students and staff from optics, aerosol science, nanomaterials, biotechnology, forensics and other fields within the discipline and allows them to see, perhaps for the first time, the research that others around them are doing. Even though my lab, ILAQH, is part of the Institute for Health and Biomedical Innovation, the distance between us and the remainder of IHBI is probably greater than just the physical distance between the two campuses. We do not seem to be particularly engaged with the culture of the remainder of IHBI and it’s very rare that our group will make the trek across to Kelvin Grove to see a presentation that is a short elevator ride away from the bulk of the IHBI membership.

I have really only been to IHBI a few times. The two most recent appearances have been for the IHBI Olympics (a week of activities where research domains compete against each other in fun activities such as Iron Chef and photo scavenger hunt) in 2011 where I performed as part of the Health and Human Wellbeing domain’s talent quest entry, a four person improvisation troupe called “Ha ha… what?”, and to present the work that the PhD students of the UPTECH project had been working on (where we killed half an hour of time before the presentations by playing impro warm-up games).

Continuing in this spirit of improvising in front of scientists, I spoke to the NMS HDR symposium at 45 minutes’ notice and in an eight minute talk managed to touch on the key points of the UPTECH project, explaining a small fraction of the science and discussing the richness of the dataset, the questions it will allow us to answer, and the diverse range of people we have involved in the project. I was told by one of the research staff in our group afterwards that it was refreshing to see a talk with no slides and that they were impressed at the quality of a talk that contained such a small amount of preparation and wondered whether I could give a presentation without speaking.

Professor Dennis Arnold, the organiser of the symposium, is now based on the same floor as me; he is one of a handful of people on our floor who are not members of ILAQH. I asked him if he thought the day was a success and he was very positive. I sincerely hope that the NMS HDR symposium continues next year and well into the future, as a way to foster interest across the traditional divide of physics vs chemistry.

I had to duck out of the symposium early to attend a meeting about one of the new units in the revamped Bachelor of Science degree. Dr Sama Low Choy, one of my supervisors, has asked me to run one of the collaborative workshops in the new quantitative methods unit (she says it’s because of my impro skills). Today was one of the planning days where we got to grips with the structure of the unit, the way the workshops are to be run and how what we are doing is significantly different to anything we’ve done before. I’ll write more about it later, such as after my first tutorial, but it’s very exciting to see QUT break with tradition and make this unit happen.

Through case studies with data sets relevant to their discipline, students will learn about quantitative methods in mathematics and statistics. We are ditching t tests, removing the need for statistical tables, adding structure to the group work to ensure people don’t get to ride on the effort of others and teaching R and MATLAB in a first year unit that only supposes Maths B. I’m really excited that we’re teaching first year students how to use software that is free (well, at least R is) and far more powerful than Microsoft Excel. One of the problems with MAB101, the old unit, was that the computation was done in Minitab, a piece of software that I’ve never known any researcher to use. One of the workshop leaders said that they want to go back and do undergrad again knowing that this unit now exists; I don’t blame them.

This will definitely be an exciting year for me, academically. A new course with new units, new facilities in the Science and Engineering Centre, new collaboration opportunities and the chance to pick somewhere new to move to at the end of the year.

Major achievements this week

  • final draft of final thesis paper ready to be submitted (pending feedback from one co-author)
  • final draft of thesis ready to be submitted (pending feedback from one supervisor)
  • got the code working properly for a paper I’m second author on, looking at deposition of ultrafine particles within the lung. While not as big a coding task as the Finnish paper this has been a major slog over the last few weeks.
  • pretty much sorted out the analysis and graphs for the paper on fungal counts indoor and outdoor in the UPTECH (for which I’m author some way down the line)
  • appointment form submitted for my postdoc.