Tag Archives: ilaqh

R Markdown

I’ve been spending a bit of time over the last few days making an R tutorial for the members of my air quality research group. Rather than being a very general introduction to the use of R, e.g. file input/output, loops, making objects, I’ve decided to show a very applied workflow that involves the actual data analysis and explaining ideas as we go along. Part of this philosophy is that I’m not going to write a statistics tutorial, opting instead to point readers to textbooks that deal with first year topics such as regression models and hypothesis tests.

It’s been a very interesting experience, and it’s meant having to deal with challenges along the way such as PDF graphs that take up so much file space for how (un-)important they are to the overall guide and, thinking about how to structure the tutorial so that I can assume zero experience with R but some experience with self-directed learning. The current version can be seen here.

One of the ideas that Sama Low Choy had for SEB113 when she was unit coordinator and lecturer and I was just a tutor, was to write a textbook for the unit because there wasn’t anything that really covered our approach. Since seeing computational stats classes in the USA being hosted as repositories on GitHub I think it might be possible to use R Markdown or GitBook to write an R Markdown project that could be compiled either as a textbook with exercises or as a set of slides.

Workshops

I had a very full week last week, with the annual Bayes on the Beach (BOB) at the Gold Coast (Mon-Wed) and Bayesian Optimal Design of Experiments  (BODE) on Friday.

BOB is an annual workshop/retreat, run by Kerrie Mengersen and the BRAG group at QUT, that brings together a bunch of Australian and international statisticians for a few days of workshops, tutorials, presentations and fun in the sun. This year was, I think, my fourth year at BOB.

One of the recurring features is the workshop sessions, where around three researchers each pose a problem to the group and everyone decides which one they’re going to work on. This year I was asked to present a problem based on the air quality research I do and so my little group worked on the issue of how to build a predictive model of indoor PM10 based on meteorology, outdoor PM10 and temporal information. We were fortunate to have Di Cook in our group, who did a lot of interesting visual analysis of the data (she later presented a tutorial on how to use ggplot and R Markdown). We ended up discussing why tree models may not be such a great idea, the difference in autocorrelation and the usefulness of distributed lag models. It gave me a lot to think about and I hope that everyone found it as valuable as I did.

The two other workshop groups worked on ranking the papers of Professor Richard Boys (one of the keynote speakers) and building a Bayesian Network model of PhD completion time. Both groups were better attended than mine, which I put down to the idea that those two were “fun” workshops and mine sounded a lot like work. Still, a diverse range of workshops means something for everyone.

James McGree (QUT) asked me if I could come to the BODE workshop to discuss some open challenges in air quality research with regards to experimental design. I gave a brief overview of regulatory monitoring, the UPTECH project’s random spatial selection and then brought in the idea that the introduction of low cost sensors gives us the opportunity to measure in so many places at once but we still need to sort out where we want to measure if we want to characterise human exposure to air pollution. While it was a small group I did get to have a good chat with the attendees about some possible ways forward. It was also good to see Julian Caley (AIMS) talk about monitoring on the Great Barrier Reef, Professor Tony Pettitt (QUT) talk about sampling for intractable likelihoods and Tristan Perez (QUT) discuss the interplay between experimental design and the use of robots.

It’s been a great end to the year to spend it in the company of statisticians working on all sorts of interesting problems. While I do enjoy my air quality work and R usage is increasing at ILAQH it’s an entirely different culture to being around people who spend their time working out whether they’re better off with data.table and reshape2 or dplyr and tidyr.

Australia-China Centre turns 1

Has it already been a year?

This week the Australia-China Centre for Air Quality Science and Management had its second annual meeting, at QUT. We got updates on the various research activities that have happened, are happening and are planned. There’s lots of interesting stuff being done to tackle a variety of problems, such as reducing workplace exposure to air pollution, quantifying the exposure of individuals and using unmanned aerial vehicles to measure air quality.

IMG_1215.jpg

Tuesday night we had the conference dinner out at the Mount Coot-tha Botanic Gardens, at the function space at the cafe/restaurant out there. I don’t think I’ve been there since my cousin’s wedding reception 15-20 years ago. I really liked that efforts were made to ensure each table had a mix of senior professors, mid- and early-career researchers and PhD students. It made for a very inclusive dinner and many different topics of conversation. Luckily I was sat with a co-worker with whom I could trade my fish entree and mains for something a little more land-based. There was even a birthday cake (chocolate mousse cake) and a number of people joined in singing “Happy Birthday” to the ACC.

Wednesday we spent the day workshopping the various planned projects to determine what issues need to be addressed in the collection and analysis of data. I ended up sitting with a group looking at the impacts of indoor temperature on mortality rates, particularly trying to estimate the relative risk of extreme heat and cold. It was good to be confronted with some new challenges to think about, rather than the same stuff I’ve been working on almost non-stop this year.

All in all, it was a good meeting even though the stress levels around here were through the roof in the lead-up. I ended up taking photos of nearly all of the presenters on the Tuesday as well as group photos with our Chinese collaborators and special invited guests.

Just got back from China

I’ve spent the last few days travelling to and from Beijing, China for the launch of the new Australia-China Centre for Air Quality Science and Management. This is a huge thing for us at ILAQH because it sets up an international collaborative agreement between a bunch of institutions in Australia and China who each have a different set of expertise that they can bring to the table, allowing us to undertake more ambitious projects than before and seek funding from a wider range of sources.

There was a lot of prep and behind the scenes meetings to take place on the first day, so Mandana (a colleague of mine) and I took a trip downtown to the Forbidden City and Tian Anmen Square and did some shopping Wangfujin. It would have been cold enough without the wind but it was such a clear day and everything was wonderful.

Day 14 - Tian Anmen Square Beijing

The first day of the launch, held at CRAES, featured the constituent groups giving a short presentation about their work. There’s a lot of great people working on some really interesting stuff and I’m very excited about the prospect of working with some of them over the coming years.

Day 15 - Lina

The second day had us split into three groups to propose various objectives and projects and put names on paper for who might be good leaders or key players in these fields as part of our centre. I joined the Transport Emissions group to discuss the control of emissions at their source, the investigation of atmospheric transformation processes and the development and uptake of new technologies. A lot of the ground work had already been laid at a planning meeting earlier in the year but it was good to put together some more concrete research topics.

After this, we went out to the National Jade Hotel for dinner, where we got to try another of the varied styles of Chinese cuisine; this time from a coastal region in North-Eastern China. I wish I had’ve paid more attention to the names of the various styles, but I enjoyed trying everything over the course of the trip, even the tripe.

Day 16 - The most important decision

The final day saw us tidying up the proposals, and an early finish meant that I got the afternoon off with Mandana, Felipe and Dion. After stumbling our way through a menu with pictures but no English translations, we had a big lunch and set off on the subway to the Temple of Heaven. It was certainly warmer than the Forbidden City (less stone, more trees) and was a very peaceful and pleasant end to the trip as we sat down at a bakery café and discussed If You Are The One while eating cream buns and drinking coffee (or in my case, peach black tea with milk). After heading back to the hotel, Mandana and I took a walk around the Bird’s Nest stadium which was only a block from our hotel. It looks like Beijing is putting effort into maintaining the area as a public plaza rather than just the grounds of a sports stadium, so even late in the cold evening it was full of families and groups of friends walking, talking and taking photographs.

Day 17 - Bird's Nest

An early morning taxi to the airport saw the start of 18 hours of travel. It’s nice to be back in one’s own bed, but I’m off to the airport again tomorrow for a 5:30am flight to Sydney for a workshop on exposure assessment with colleagues from the Centre for Air quality and health Research and evaluation. After a week of disastrously bad coffee, I’m glad that I booked accommodation which advertises itself as being 15m from the Toby’s Estate café.

Two pieces of good news this week

The full paper from the EMAC2013 conference last year is now available online. If you’re interested in the statistical methodology we used for estimating the inhaled dose of particles by students in the UPTECH project, you should check out our paper at the ANZIAM Journal (click the link that says “PDF” down the bottom under Full Text).

More importantly, though, we were successful in applying for an ARC Discovery Project! This project will run for three years and combines spatio-temporal statistical modelling, sensor miniaturisation and mobile phone technologies to allow people to minimise their exposure to air pollution. Our summary of the project, from the list of successful projects:

This interdisciplinary project aims to develop a personalised air pollution exposure monitoring system, leveraging the ubiquitousness and advancements in mobile phone technology and state of the art miniaturisation of monitoring sensors, data transmission and analysis. Airborne pollution is one of the top contemporary risks faced by humans; however, at present individuals have no way to recognise that they are at risk or need to protect themselves. It is expected that the outcome will empower individuals to control and minimise their own exposures. This is expected to lead to significant national socioeconomic benefits and bring global advancement in acquiring and utilising environmental information.

Other people at ILAQH were also successful in getting a Discovery Project grant looking at new particle formation and cloud formation in the Great Barrier Reef. I won’t be involved in that project but it sounds fascinating.

Revising another paper

We got a paper back from the reviewers a few days ago and there are some comments requesting revisions to the explanation of the statistical methods and the analysis. It’s interesting coming back to this paper, about a year after I last saw it (it’s been sent around to a few different journals to try to find a home for it). The PhD student who is the main author got into R and ggplot2 last year and has done some good work with linear mixed effects models and visualisation but some of the plots are the same sort of thing one might do in Excel (lots of boxplots next to each other rather than making use of small multiples).

So now I get to delve back into some data and analysis that’s about a year old with fresh eyes. Having done more with ggplot2 over the last 12 months, there are some things that I will definitely change about this. The student and I had a chat this morning about how to tackle it, and we’re trying to choose the best way to split up our data for visualisation: 6 treatments, 4 measurement blocks, two different measures (PM2.5 mass concentration and PNC), a total of 48 boxplots, density plots or histograms.

A paper with another PhD student has had its open discussion finalised now, which means more writing is probably going to happen. I find ACP‘s model quite interesting. The articles are peer reviewed, published for discussion, and then revised by the authors for final publication. I guess it spreads the review work out a bit and allows for multiple voices to be heard before final publication, each with a different approach and background.

Learning different programming languages

One of the biggest changes I noticed moving from doing statistics in Minitab (in first year data analysis) to doing statistics in R (in third year statistical inference) was that R encourages you to write functions. Normally this is done by writing functions in R’s own language (that call other functions, also written in R, which eventually call functions written in C) but it’s also possible to make use of other languages to do the heavy lifting. This isn’t unique to R, of course; MATLAB encourages the use of MEX files to improve run-times when you need to call the same custom function over and over again.

I’ve really only used high level languages to do my statistics, making use of other peoples’ optimised code that do the things that I want. I’ve seen the development of pyMCMC by reseachers at QUT and someone from my NPBayes reading group made quite heavy use of RCpp in his thesis. Python and C++ are probably the two languages that would be the most useful to learn given their ubiquity and reputation. I have been putting off learning these for years as I know that there’s a large time investment required to become proficient in programming and no external pressure to learn (unlike learning R as part of my PhD thesis work).

There’s no doubt that writing optimised code is a desirable thing to do, and that knowing more than one programming language (and how to use them together) gives you a much richer toolbox for numerically solving problems. I’m now at a point, though, where it looks like I may need to bite the bullet and pick up C++. JAGS, which I use through rjags in R, is a stable, fast platform for MCMC-based inference. It’s written in C++ and notifies you every time you load it in R that it has loaded the basemod and bugs modules. There are additional modules available (check in \JAGS\JAGS-3.4.0\x64\modules\) and it’s possible to write your own, as long as you know C++.

I’m at a point with the work I’ve been doing on estimating personal dose of ultrafine particles that I’d like to make the modelling more Bayesian, which includes figuring out a way to include the deposition model in the MCMC scheme (as I’d like to put a prior on the shape parameter of the reconstructed size distribution). My options seem to be either writing a JAGS module that will allow me to call a C++ified version of the function or to abandon JAGS and write a Gibbs sampler (or Metropolis-Hastings, but Gibbs will likely be quicker given the simplicity of the model I’m interested in). Either solution will stretch me as a programmer and probably give me a better understanding of the problem. Eubank and Kupresanin’s “Statistical Computing in C++ and R” is staring at me from the shelf above my desk.