Just got back from China

I’ve spent the last few days travelling to and from Beijing, China for the launch of the new Australia-China Centre for Air Quality Science and Management. This is a huge thing for us at ILAQH because it sets up an international collaborative agreement between a bunch of institutions in Australia and China who each have a different set of expertise that they can bring to the table, allowing us to undertake more ambitious projects than before and seek funding from a wider range of sources.

There was a lot of prep and behind the scenes meetings to take place on the first day, so Mandana (a colleague of mine) and I took a trip downtown to the Forbidden City and Tian Anmen Square and did some shopping Wangfujin. It would have been cold enough without the wind but it was such a clear day and everything was wonderful.

Day 14 - Tian Anmen Square Beijing

The first day of the launch, held at CRAES, featured the constituent groups giving a short presentation about their work. There’s a lot of great people working on some really interesting stuff and I’m very excited about the prospect of working with some of them over the coming years.

Day 15 - Lina

The second day had us split into three groups to propose various objectives and projects and put names on paper for who might be good leaders or key players in these fields as part of our centre. I joined the Transport Emissions group to discuss the control of emissions at their source, the investigation of atmospheric transformation processes and the development and uptake of new technologies. A lot of the ground work had already been laid at a planning meeting earlier in the year but it was good to put together some more concrete research topics.

After this, we went out to the National Jade Hotel for dinner, where we got to try another of the varied styles of Chinese cuisine; this time from a coastal region in North-Eastern China. I wish I had’ve paid more attention to the names of the various styles, but I enjoyed trying everything over the course of the trip, even the tripe.

Day 16 - The most important decision

The final day saw us tidying up the proposals, and an early finish meant that I got the afternoon off with Mandana, Felipe and Dion. After stumbling our way through a menu with pictures but no English translations, we had a big lunch and set off on the subway to the Temple of Heaven. It was certainly warmer than the Forbidden City (less stone, more trees) and was a very peaceful and pleasant end to the trip as we sat down at a bakery café and discussed If You Are The One while eating cream buns and drinking coffee (or in my case, peach black tea with milk). After heading back to the hotel, Mandana and I took a walk around the Bird’s Nest stadium which was only a block from our hotel. It looks like Beijing is putting effort into maintaining the area as a public plaza rather than just the grounds of a sports stadium, so even late in the cold evening it was full of families and groups of friends walking, talking and taking photographs.

Day 17 - Bird's Nest

An early morning taxi to the airport saw the start of 18 hours of travel. It’s nice to be back in one’s own bed, but I’m off to the airport again tomorrow for a 5:30am flight to Sydney for a workshop on exposure assessment with colleagues from the Centre for Air quality and health Research and evaluation. After a week of disastrously bad coffee, I’m glad that I booked accommodation which advertises itself as being 15m from the Toby’s Estate café.

Two pieces of good news this week

The full paper from the EMAC2013 conference last year is now available online. If you’re interested in the statistical methodology we used for estimating the inhaled dose of particles by students in the UPTECH project, you should check out our paper at the ANZIAM Journal (click the link that says “PDF” down the bottom under Full Text).

More importantly, though, we were successful in applying for an ARC Discovery Project! This project will run for three years and combines spatio-temporal statistical modelling, sensor miniaturisation and mobile phone technologies to allow people to minimise their exposure to air pollution. Our summary of the project, from the list of successful projects:

This interdisciplinary project aims to develop a personalised air pollution exposure monitoring system, leveraging the ubiquitousness and advancements in mobile phone technology and state of the art miniaturisation of monitoring sensors, data transmission and analysis. Airborne pollution is one of the top contemporary risks faced by humans; however, at present individuals have no way to recognise that they are at risk or need to protect themselves. It is expected that the outcome will empower individuals to control and minimise their own exposures. This is expected to lead to significant national socioeconomic benefits and bring global advancement in acquiring and utilising environmental information.

Other people at ILAQH were also successful in getting a Discovery Project grant looking at new particle formation and cloud formation in the Great Barrier Reef. I won’t be involved in that project but it sounds fascinating.

posterior samples

I probably should have put this post up earlier because it’s now a huge collection of stuff from the last month. Here we go!

It appears that Hilary Parker and I have similar (but by no means identical) work setups for doing stats (or at least we did two years ago). It’s never too late to come up with a sensible way of organising your work and collection of references/downloaded papers.

Applied statisticians should probably teach scientists what it is we do, rather than just the mathematics behind statistics. This is a difference I’ve noticed between SEB113 and more traditional statistics classes; we spend a lot less time discussion F distributions and a lot more time on model development and visualisation.

Speaking of visualisation, here’s a really great article on visualisation and how we can use small multiples and colour, shape, etc. to highlight the interesting differences so that it’s very clear what our message is.

Jeff Leek has compiled a list of some of the most awesome data people on Twitter who happen to be female.

In the ongoing crusade against abuse of p-values, we may want to instead focus on reproducibility to show that our results say what we say they do. Andrew Gelman and Eric Loken have an article in The American Statistician reminding us that p-values have a context and we need to be aware of issues like sample size, p-hacking, multiple comparisons, etc.

 

 

 

 

Posterior samples

SEB113 students really seemed to enjoy looking at mathematical modelling last week. The Lotka-Volterra equations continue to be a good teaching tool. A student pointed out that when reviewing the limit idea for derivatives it’d be useful to show it with approximating the circumference of a circle using a polygon. So I knocked this up:

approximations

Are you interested in big data and/or air quality? Consider doing a PhD with me.

This week I showed in the workshop how Markov chains are a neat application of linear algebra for dealing with probability. We used this interactive visualisation to investigate what happens as the transition probabilities change.

Zoubin Ghahramani has written a really nice review paper of Bayesian non-parametrics that I really recommend checking out if you’re interested in the new modelling techniques that have been coming out in the last few years for complex data sets.

Exercism.io is a new service for learning how to master programming by getting feedback on exercises.

The problem with p values

A coworker sent me this article about alternatives to the default 0.05 p value in hypothesis testing as a way to improve the corpus of published articles so that we can actually expect reproducability and have a bit more faith that these results are meaningful. The article is based on a paper published in the Proceedings of the National Academy of Sciences which talks about mapping Bayes Factors to p values for hypothesis tests so that there’s a way to think about the strength of the evidence.

The more I do and teach statistics the more I detest frequentist hypothesis testing (including whether a regression coefficient is zero) as a means of describing whether or not something plays a “significant” role in explaining some physical phenomenon. In fact, the entire idea of statistical significance sits ill with me because the way we tend to view it is that 0.051 is not significant and 0.049 is significant, even though there’s only a very small difference between the two. I guess if you’re dealing with cutoffs you’ve got to put the cutoff somewhere, but turning something which by its very nature deals with uncertainty into a set of rigid rules about what’s significant and what’s not seems pretty stupid.

My distaste for frequentist methods means that even for simple linear regressions I’ll fire up JAGS in R and fit a Bayesian model because I fundamentally disagree with the idea of an unknown but fixed true parameter. Further to this, the nuances of p values being distributed uniformly under the Null hypothesis means that we can very quickly make incorrect statements.

I agree with the author of the article that shifting hypothesis testing p value goal posts won’t achieve what we want and I’ll have a bit closer a read of the paper. For the time being, I’ll continue to just mull this over and grumble when people say “statistically significant” without any reference to a significance level.

NB: this post has been in an unfinished state since last November, when the paper started getting media coverage.

Revising another paper

We got a paper back from the reviewers a few days ago and there are some comments requesting revisions to the explanation of the statistical methods and the analysis. It’s interesting coming back to this paper, about a year after I last saw it (it’s been sent around to a few different journals to try to find a home for it). The PhD student who is the main author got into R and ggplot2 last year and has done some good work with linear mixed effects models and visualisation but some of the plots are the same sort of thing one might do in Excel (lots of boxplots next to each other rather than making use of small multiples).

So now I get to delve back into some data and analysis that’s about a year old with fresh eyes. Having done more with ggplot2 over the last 12 months, there are some things that I will definitely change about this. The student and I had a chat this morning about how to tackle it, and we’re trying to choose the best way to split up our data for visualisation: 6 treatments, 4 measurement blocks, two different measures (PM2.5 mass concentration and PNC), a total of 48 boxplots, density plots or histograms.

A paper with another PhD student has had its open discussion finalised now, which means more writing is probably going to happen. I find ACP‘s model quite interesting. The articles are peer reviewed, published for discussion, and then revised by the authors for final publication. I guess it spreads the review work out a bit and allows for multiple voices to be heard before final publication, each with a different approach and background.

That feeling when former students contact you

Last year I had a student in SEB113 who came in to the subject with a distaste for mathematics and statistics; they struggled with both the statistical concepts and the use of R throughout the semester and looked as though they would rather be anywhere else during the collaborative workshops. This student made it to every lecture and workshop though and came to enjoy the work of using R for statistical analysis of data; and earned a 7 in the unit.

I just got an email from them asking for a reference for their VRES (Vacation Research Experience Scheme) project application. Not only am I proud of this student for working their butt off to get a 7 in a subject they disliked but came to find interesting, but I am over the moon to hear that they are interested in undertaking scientific field research. This student mentions how my “passion for teaching completely transformed my (their) view of statistics”, and their passion for the research topic is reflected in the email.

This sort of stuff is probably the most rewarding aspect of lecturing.