ALP wants to teach kids how to program, and I agree

I checked in on one of my workshop classes this morning to see how everyone was going in the final week, to remind them of the remaining help sessions and to check that they’re on track to complete their group assignments.

There weren’t many students in the class, what with it being week 13, but of one of the students was very proud of the fact that she’d lifted her marks on the problem solving tasks from 1/10 to 8/10 over the course of the semester. She told me that going back over the last few workshops helped reinforce the coding that she needed to be able to do in order to complete the assessment.

She plans on transferring into medicine, which is typically not a career that requires programming. At the end of the semester, with only one piece of assessment remaining and the decision made that she will change out of science, she is still putting a lot of effort into understanding the statistics and learning how to program is reinforcing this and allowing her to engage deeper than if we were restricted to the stats education I had in first year ten years ago where we spent a lot of time looking up the tails of distributions in a book of tables.

Maths and statistics education (for students not studying maths/stats as a major) is no longer just about teaching students how to do long division in high school and calculus and point and click statistics methods at university. While some degree such as Electrical Engineering, Computer Science and IT have traditionally been associated with some amount of programming, it’s becoming more and more common for maths and stats service units to include MATLAB or R as a means of engaging deeper with the mathematical content and understanding solutions to linear systems and differential equations or performing data analysis and visualisation. Learning to program leads to better understanding of what you’re actually doing with the code.

Computers are everywhere in our students’ lives and in their educational experiences. Due to their ubiquity, the relationship students have with computers is very different to what it was 10 years ago. Computers are great at enabling access to knowledge through library databases, Wikipedia and a bunch of other online repositories. But it’s not enough to be able to look up the answer, one also has to be able to calculate an answer when it hasn’t been determined by someone else. There is not yet a mathematics or statistics package that does all of the data analysis and all of the mathematical analysis that we might want to do in a classroom with a point and click, drag and drop interface.

To this end, I teach my students how to use R to solve a problem. Computers can do nearly anything, but we have to be able to tell the computer how to do it. Learning simple coding skills in school prepares students to tackle more advanced coding in quantitative units in their university studies but it also teaches an understanding of how processes work based on inputs and outputs, and not just computational processes, it’s all about a literacy of processes and functions (inputs and outputs). Learning to code isn’t just about writing code as a profession no more than teaching students to read is done to prepare them in their profession of priest or newsreader. Coding provides another set of skills that are relevant to the future of learning and participation in society and the workforce, just as learning mathematics allows people to understand things like bank loans.

Tony Abbott does not sound like he’s on board with the idea of giving kids the skills to get along in a world in which computers are part of our classroom the way books were when he was going through school. While reading, writing and basic mathematics skills will continue to be important skills, literacy is more than just reading comprehension. Information literacy, being able to handle data, and being able to reason out a process are even more important thanks to the changing technologies we are experiencing. Not every student is going to be a professional programmer, an app developer or big data analyst, but coding will be a skill which becomes more and more necessary as computers become more and more a part of our workplace not just as fancy typewriters or an instantaneous postal system but as a problem solving tool.

Marrying differential equations and regression

Professor Fabrizio Ruggeri (Milan) visited the Institute for Future Environments for a little while in late 2013. He has been appointed as Adjunct Professor to the Institute and gave a public talk with a brief overview of a few of his research interests. Stochastic modelling of physical systems is something I was exposed to in undergrad when a good friend of mine, Matt Begun (who it turns out is doing a PhD under Professor Guy Marks, with whom ILAQH collaborates), suggested we do a joint Honours project where we each tackled the same problem but from different points of view, me as a mathematical modeller, him as a Bayesian statistician. It didn’t eventuate but it had stuck in my mind as an interesting topic.

In SEB113 we go through some non-linear regression models and the mathematical models that give rise to them. Regression typically features a fixed equation and variable parameters and the mathematical modelling I’ve been exposed to features fixed parameters (elicited from lab experiments, previous studies, etc.) and numerical simulation of a differential equation to solve the system (as analytic methods aren’t always easy to employ). I found myself thinking “I wonder if there’s a way of doing both at once” and then shelved the thought because there was no way I would have the time to go and thoroughly research it.

Having spent a bit of time thinking about it, I’ve had a crack at solving an ODE within a Bayesian regression model (Euler’s method in JAGS) for logistic growth and the Lotka-Volterra equations. I’ve started having some discussions with other mathematicians about how we marry these two ideas and it looks like I’ll be able to start redeveloping my mathematical modelling knowledge.

This is somewhere I think applied statistics has a huge role to play in applied mathematical modelling. Mathematicians shouldn’t be constraining themselves to iterating over a grid of point estimates of parameters, then choosing the one which minimises some Lp-norm (at least not without something like Approximate Bayesian Computation).

I mean, why explore regions of the parameter space that are unlikely to yield simulations that match up with the data? If you’re going to simulate a bunch of simulations, it should be done with the aim of not just finding the most probable values but characterising uncertainty in the parameters. A grid of values representing a very structured form of non-random prior won’t give you that. Finding the maximum with some sort of gradient-based method will give you the most probable values but, again, doesn’t characterise uncertainty. Sometimes we don’t care about that uncertainty, but when we do we’re far better off using statistics and using it properly.

Just got back from China

I’ve spent the last few days travelling to and from Beijing, China for the launch of the new Australia-China Centre for Air Quality Science and Management. This is a huge thing for us at ILAQH because it sets up an international collaborative agreement between a bunch of institutions in Australia and China who each have a different set of expertise that they can bring to the table, allowing us to undertake more ambitious projects than before and seek funding from a wider range of sources.

There was a lot of prep and behind the scenes meetings to take place on the first day, so Mandana (a colleague of mine) and I took a trip downtown to the Forbidden City and Tian Anmen Square and did some shopping Wangfujin. It would have been cold enough without the wind but it was such a clear day and everything was wonderful.

Day 14 - Tian Anmen Square Beijing

The first day of the launch, held at CRAES, featured the constituent groups giving a short presentation about their work. There’s a lot of great people working on some really interesting stuff and I’m very excited about the prospect of working with some of them over the coming years.

Day 15 - Lina

The second day had us split into three groups to propose various objectives and projects and put names on paper for who might be good leaders or key players in these fields as part of our centre. I joined the Transport Emissions group to discuss the control of emissions at their source, the investigation of atmospheric transformation processes and the development and uptake of new technologies. A lot of the ground work had already been laid at a planning meeting earlier in the year but it was good to put together some more concrete research topics.

After this, we went out to the National Jade Hotel for dinner, where we got to try another of the varied styles of Chinese cuisine; this time from a coastal region in North-Eastern China. I wish I had’ve paid more attention to the names of the various styles, but I enjoyed trying everything over the course of the trip, even the tripe.

Day 16 - The most important decision

The final day saw us tidying up the proposals, and an early finish meant that I got the afternoon off with Mandana, Felipe and Dion. After stumbling our way through a menu with pictures but no English translations, we had a big lunch and set off on the subway to the Temple of Heaven. It was certainly warmer than the Forbidden City (less stone, more trees) and was a very peaceful and pleasant end to the trip as we sat down at a bakery café and discussed If You Are The One while eating cream buns and drinking coffee (or in my case, peach black tea with milk). After heading back to the hotel, Mandana and I took a walk around the Bird’s Nest stadium which was only a block from our hotel. It looks like Beijing is putting effort into maintaining the area as a public plaza rather than just the grounds of a sports stadium, so even late in the cold evening it was full of families and groups of friends walking, talking and taking photographs.

Day 17 - Bird's Nest

An early morning taxi to the airport saw the start of 18 hours of travel. It’s nice to be back in one’s own bed, but I’m off to the airport again tomorrow for a 5:30am flight to Sydney for a workshop on exposure assessment with colleagues from the Centre for Air quality and health Research and evaluation. After a week of disastrously bad coffee, I’m glad that I booked accommodation which advertises itself as being 15m from the Toby’s Estate café.

Two pieces of good news this week

The full paper from the EMAC2013 conference last year is now available online. If you’re interested in the statistical methodology we used for estimating the inhaled dose of particles by students in the UPTECH project, you should check out our paper at the ANZIAM Journal (click the link that says “PDF” down the bottom under Full Text).

More importantly, though, we were successful in applying for an ARC Discovery Project! This project will run for three years and combines spatio-temporal statistical modelling, sensor miniaturisation and mobile phone technologies to allow people to minimise their exposure to air pollution. Our summary of the project, from the list of successful projects:

This interdisciplinary project aims to develop a personalised air pollution exposure monitoring system, leveraging the ubiquitousness and advancements in mobile phone technology and state of the art miniaturisation of monitoring sensors, data transmission and analysis. Airborne pollution is one of the top contemporary risks faced by humans; however, at present individuals have no way to recognise that they are at risk or need to protect themselves. It is expected that the outcome will empower individuals to control and minimise their own exposures. This is expected to lead to significant national socioeconomic benefits and bring global advancement in acquiring and utilising environmental information.

Other people at ILAQH were also successful in getting a Discovery Project grant looking at new particle formation and cloud formation in the Great Barrier Reef. I won’t be involved in that project but it sounds fascinating.

posterior samples

I probably should have put this post up earlier because it’s now a huge collection of stuff from the last month. Here we go!

It appears that Hilary Parker and I have similar (but by no means identical) work setups for doing stats (or at least we did two years ago). It’s never too late to come up with a sensible way of organising your work and collection of references/downloaded papers.

Applied statisticians should probably teach scientists what it is we do, rather than just the mathematics behind statistics. This is a difference I’ve noticed between SEB113 and more traditional statistics classes; we spend a lot less time discussion F distributions and a lot more time on model development and visualisation.

Speaking of visualisation, here’s a really great article on visualisation and how we can use small multiples and colour, shape, etc. to highlight the interesting differences so that it’s very clear what our message is.

Jeff Leek has compiled a list of some of the most awesome data people on Twitter who happen to be female.

In the ongoing crusade against abuse of p-values, we may want to instead focus on reproducibility to show that our results say what we say they do. Andrew Gelman and Eric Loken have an article in The American Statistician reminding us that p-values have a context and we need to be aware of issues like sample size, p-hacking, multiple comparisons, etc.

 

 

 

 

Posterior samples

SEB113 students really seemed to enjoy looking at mathematical modelling last week. The Lotka-Volterra equations continue to be a good teaching tool. A student pointed out that when reviewing the limit idea for derivatives it’d be useful to show it with approximating the circumference of a circle using a polygon. So I knocked this up:

approximations

Are you interested in big data and/or air quality? Consider doing a PhD with me.

This week I showed in the workshop how Markov chains are a neat application of linear algebra for dealing with probability. We used this interactive visualisation to investigate what happens as the transition probabilities change.

Zoubin Ghahramani has written a really nice review paper of Bayesian non-parametrics that I really recommend checking out if you’re interested in the new modelling techniques that have been coming out in the last few years for complex data sets.

Exercism.io is a new service for learning how to master programming by getting feedback on exercises.

The problem with p values

A coworker sent me this article about alternatives to the default 0.05 p value in hypothesis testing as a way to improve the corpus of published articles so that we can actually expect reproducability and have a bit more faith that these results are meaningful. The article is based on a paper published in the Proceedings of the National Academy of Sciences which talks about mapping Bayes Factors to p values for hypothesis tests so that there’s a way to think about the strength of the evidence.

The more I do and teach statistics the more I detest frequentist hypothesis testing (including whether a regression coefficient is zero) as a means of describing whether or not something plays a “significant” role in explaining some physical phenomenon. In fact, the entire idea of statistical significance sits ill with me because the way we tend to view it is that 0.051 is not significant and 0.049 is significant, even though there’s only a very small difference between the two. I guess if you’re dealing with cutoffs you’ve got to put the cutoff somewhere, but turning something which by its very nature deals with uncertainty into a set of rigid rules about what’s significant and what’s not seems pretty stupid.

My distaste for frequentist methods means that even for simple linear regressions I’ll fire up JAGS in R and fit a Bayesian model because I fundamentally disagree with the idea of an unknown but fixed true parameter. Further to this, the nuances of p values being distributed uniformly under the Null hypothesis means that we can very quickly make incorrect statements.

I agree with the author of the article that shifting hypothesis testing p value goal posts won’t achieve what we want and I’ll have a bit closer a read of the paper. For the time being, I’ll continue to just mull this over and grumble when people say “statistically significant” without any reference to a significance level.

NB: this post has been in an unfinished state since last November, when the paper started getting media coverage.