Diagnostics for first year students

The SEB113 teaching team last semester (me, Ruth Luscombe, Iwona Czaplinski, Brett Fyfield) wrote a paper for the HERDSA conference about the relationship between student engagement and success. We collected data on the timing of students’ use of the adaptive release tool we developed, where students confirm that they’ve seen some preparatory material before being given access to the lecture, computer lab and workshop material. We built a regression model that looked at the relationship between the number of weeks of material students gave themselves access to and their end of semester marks (out of 100%), and it showed that students who engaged more obtained better marks, where engagement also included active use of the Facebook group and attendance at workshop classes. I had assumed that we’d be able to get data on students’ maths backgrounds coming in, but with so many ways to enter university, we don’t have the background info on every student. QUT has set Queensland Senior Maths B as the assumed knowledge for SEB113 (and indeed the broader ST01 Bachelor of Science degree) and I’m interested in knowing whether or not the level of maths of students coming in has a bearing on how well they do over the course of the unit.

This semester, we decided that it’d be good to not just get a sense of the students’ educational backgrounds but to assess what their level of mathematical and statistical skills are. We designed a diagnostic to run in the first lecture that would canvas students on their educational background, their attitudes towards mathematics and statistics, and how well they could answer a set of questions that a student passing Senior Maths B would be able to complete. The questions were taken from the PhD thesis of Dr Therese Wilson and research published by Dr Helen MacGillivray (both at QUT), so I’m fairly confident we’re asking the right questions. One thing I really liked about Dr MacGillivray’s diagnostic tool, a multiple choice test designed for engineering students, is that each incorrect choice is wrong for a very specific reason, such as not getting the order of operations right, not recognising something as a difference of squares, etc.

I’m about to get the scanned and processed results back from the library and it turns out that a number of students didn’t put their name or student number on the answer sheet. Some put their names down but didn’t fill in the circles, so the machine that scans the answer sheet won’t be able to determine who the student is and it’ll take some manual data entry probably on my part to ensure that we can get as many students as possible the results of their diagnostic. So while I’ll have a good sense of the class overall, and how we need to support them, it’ll be harder than it should be to ensure that the people who need the help are able to be targetted for such help.

Next semester I’ll try to run the same sort of thing, perhaps with a few modifications. We’ll need to be very clear about entering student numbers and names so that we can get everyone their own results. It’d be good to write a paper that follows on from our HERDSA paper and includes more information about educational background. It might also be interesting to check the relationship between students’ strength in particular topics (e.g. calculus, probability) and their marks on the corresponding items of assessment. Getting it right next semester and running it again in Semester 1 2017 would be a very useful way of gauging whether students who are weak in particular topics struggle to do well on certain pieces of assessment.


R Markdown

I’ve been spending a bit of time over the last few days making an R tutorial for the members of my air quality research group. Rather than being a very general introduction to the use of R, e.g. file input/output, loops, making objects, I’ve decided to show a very applied workflow that involves the actual data analysis and explaining ideas as we go along. Part of this philosophy is that I’m not going to write a statistics tutorial, opting instead to point readers to textbooks that deal with first year topics such as regression models and hypothesis tests.

It’s been a very interesting experience, and it’s meant having to deal with challenges along the way such as PDF graphs that take up so much file space for how (un-)important they are to the overall guide and, thinking about how to structure the tutorial so that I can assume zero experience with R but some experience with self-directed learning. The current version can be seen here.

One of the ideas that Sama Low Choy had for SEB113 when she was unit coordinator and lecturer and I was just a tutor, was to write a textbook for the unit because there wasn’t anything that really covered our approach. Since seeing computational stats classes in the USA being hosted as repositories on GitHub I think it might be possible to use R Markdown or GitBook to write an R Markdown project that could be compiled either as a textbook with exercises or as a set of slides.

Blogging about blogging

I was inspired to make a website and start blogging about my work when I went to 8BNP in 2011 and met people like Kevin Canini and Tamara Broderick who had websites to spruik themselves as researchers. I eventually got around to re-setting up my WordPress account, buying a domain and setting up the whole DNS shebang.

The last four years have seen some major changes in the web resources for research, with things like github taking the place of subversion and encouraging a more social and outward facing coding culture. You can blog using github now, and Nick Tierney (a PhD student at QUT) has made me think about whether it’s worth migrating from WordPress to jekyll. Further exposure to R Markdown through Di Cook’s workshop at Bayes on the Beach has strengthened my belief in RStudio not just as a way to do research but to communicate it. This is even before we start considering all the things like shiny and embedded web stuff.

It’ll take some work and I’m not sure I’ll have time over summer, but it’s a change that’s probably worth making.

Two big pieces of news

I’ve just signed an acceptance of offer of employment which will take me fully back into maths at QUT, 50% teaching in the Mathematical Sciences School and 50% researching with Kerrie Mengersen under her ARC Laureate Fellowship. Over the last few years I’ve been supported variously by Professor Lidia Morawska in the International Laboratory for Air Quality and Health, the NHMRC Centre of Research Excellence for Air quality and health Research and evaluation, QUT’s Institute for Future Environments and Mathematical Sciences School to whom I’m very grateful.

The second piece of big news is that with Ruth Luscombe and Nick Tierney, SEB113 has been recognised with a Vice-Chancellor’s Performance Award for innovation in teaching. We’ve put a lot of work into the unit this year, along with Iwona Czaplinski, Brett Fyfield, Jocelyne Bouzaid and Amy Stringer and the guidance of Ian Turner and Steve Stern. Ruth, Iwona, Brett and I have a paper accepted as part of an education conference next year and it’s a nice confirmation of all that we’ve done over the last 3 years (from Sama Low Choy’s first delivery when I was just a tutor) to take the unit from a grab bag of topics that students didn’t feel was particularly well connected to a coherent series of lecture-lab-workshop sequences that introduce and reinforce six weeks of each of mathematics and statistics topics that students tell us have helped them come to understand the role of quantitative analysis in science.


I had a very full week last week, with the annual Bayes on the Beach (BOB) at the Gold Coast (Mon-Wed) and Bayesian Optimal Design of Experiments  (BODE) on Friday.

BOB is an annual workshop/retreat, run by Kerrie Mengersen and the BRAG group at QUT, that brings together a bunch of Australian and international statisticians for a few days of workshops, tutorials, presentations and fun in the sun. This year was, I think, my fourth year at BOB.

One of the recurring features is the workshop sessions, where around three researchers each pose a problem to the group and everyone decides which one they’re going to work on. This year I was asked to present a problem based on the air quality research I do and so my little group worked on the issue of how to build a predictive model of indoor PM10 based on meteorology, outdoor PM10 and temporal information. We were fortunate to have Di Cook in our group, who did a lot of interesting visual analysis of the data (she later presented a tutorial on how to use ggplot and R Markdown). We ended up discussing why tree models may not be such a great idea, the difference in autocorrelation and the usefulness of distributed lag models. It gave me a lot to think about and I hope that everyone found it as valuable as I did.

The two other workshop groups worked on ranking the papers of Professor Richard Boys (one of the keynote speakers) and building a Bayesian Network model of PhD completion time. Both groups were better attended than mine, which I put down to the idea that those two were “fun” workshops and mine sounded a lot like work. Still, a diverse range of workshops means something for everyone.

James McGree (QUT) asked me if I could come to the BODE workshop to discuss some open challenges in air quality research with regards to experimental design. I gave a brief overview of regulatory monitoring, the UPTECH project’s random spatial selection and then brought in the idea that the introduction of low cost sensors gives us the opportunity to measure in so many places at once but we still need to sort out where we want to measure if we want to characterise human exposure to air pollution. While it was a small group I did get to have a good chat with the attendees about some possible ways forward. It was also good to see Julian Caley (AIMS) talk about monitoring on the Great Barrier Reef, Professor Tony Pettitt (QUT) talk about sampling for intractable likelihoods and Tristan Perez (QUT) discuss the interplay between experimental design and the use of robots.

It’s been a great end to the year to spend it in the company of statisticians working on all sorts of interesting problems. While I do enjoy my air quality work and R usage is increasing at ILAQH it’s an entirely different culture to being around people who spend their time working out whether they’re better off with data.table and reshape2 or dplyr and tidyr.

Australia-China Centre turns 1

Has it already been a year?

This week the Australia-China Centre for Air Quality Science and Management had its second annual meeting, at QUT. We got updates on the various research activities that have happened, are happening and are planned. There’s lots of interesting stuff being done to tackle a variety of problems, such as reducing workplace exposure to air pollution, quantifying the exposure of individuals and using unmanned aerial vehicles to measure air quality.


Tuesday night we had the conference dinner out at the Mount Coot-tha Botanic Gardens, at the function space at the cafe/restaurant out there. I don’t think I’ve been there since my cousin’s wedding reception 15-20 years ago. I really liked that efforts were made to ensure each table had a mix of senior professors, mid- and early-career researchers and PhD students. It made for a very inclusive dinner and many different topics of conversation. Luckily I was sat with a co-worker with whom I could trade my fish entree and mains for something a little more land-based. There was even a birthday cake (chocolate mousse cake) and a number of people joined in singing “Happy Birthday” to the ACC.

Wednesday we spent the day workshopping the various planned projects to determine what issues need to be addressed in the collection and analysis of data. I ended up sitting with a group looking at the impacts of indoor temperature on mortality rates, particularly trying to estimate the relative risk of extreme heat and cold. It was good to be confronted with some new challenges to think about, rather than the same stuff I’ve been working on almost non-stop this year.

All in all, it was a good meeting even though the stress levels around here were through the roof in the lead-up. I ended up taking photos of nearly all of the presenters on the Tuesday as well as group photos with our Chinese collaborators and special invited guests.

ALP wants to teach kids how to program, and I agree

I checked in on one of my workshop classes this morning to see how everyone was going in the final week, to remind them of the remaining help sessions and to check that they’re on track to complete their group assignments.

There weren’t many students in the class, what with it being week 13, but of one of the students was very proud of the fact that she’d lifted her marks on the problem solving tasks from 1/10 to 8/10 over the course of the semester. She told me that going back over the last few workshops helped reinforce the coding that she needed to be able to do in order to complete the assessment.

She plans on transferring into medicine, which is typically not a career that requires programming. At the end of the semester, with only one piece of assessment remaining and the decision made that she will change out of science, she is still putting a lot of effort into understanding the statistics and learning how to program is reinforcing this and allowing her to engage deeper than if we were restricted to the stats education I had in first year ten years ago where we spent a lot of time looking up the tails of distributions in a book of tables.

Maths and statistics education (for students not studying maths/stats as a major) is no longer just about teaching students how to do long division in high school and calculus and point and click statistics methods at university. While some degree such as Electrical Engineering, Computer Science and IT have traditionally been associated with some amount of programming, it’s becoming more and more common for maths and stats service units to include MATLAB or R as a means of engaging deeper with the mathematical content and understanding solutions to linear systems and differential equations or performing data analysis and visualisation. Learning to program leads to better understanding of what you’re actually doing with the code.

Computers are everywhere in our students’ lives and in their educational experiences. Due to their ubiquity, the relationship students have with computers is very different to what it was 10 years ago. Computers are great at enabling access to knowledge through library databases, Wikipedia and a bunch of other online repositories. But it’s not enough to be able to look up the answer, one also has to be able to calculate an answer when it hasn’t been determined by someone else. There is not yet a mathematics or statistics package that does all of the data analysis and all of the mathematical analysis that we might want to do in a classroom with a point and click, drag and drop interface.

To this end, I teach my students how to use R to solve a problem. Computers can do nearly anything, but we have to be able to tell the computer how to do it. Learning simple coding skills in school prepares students to tackle more advanced coding in quantitative units in their university studies but it also teaches an understanding of how processes work based on inputs and outputs, and not just computational processes, it’s all about a literacy of processes and functions (inputs and outputs). Learning to code isn’t just about writing code as a profession no more than teaching students to read is done to prepare them in their profession of priest or newsreader. Coding provides another set of skills that are relevant to the future of learning and participation in society and the workforce, just as learning mathematics allows people to understand things like bank loans.

Tony Abbott does not sound like he’s on board with the idea of giving kids the skills to get along in a world in which computers are part of our classroom the way books were when he was going through school. While reading, writing and basic mathematics skills will continue to be important skills, literacy is more than just reading comprehension. Information literacy, being able to handle data, and being able to reason out a process are even more important thanks to the changing technologies we are experiencing. Not every student is going to be a professional programmer, an app developer or big data analyst, but coding will be a skill which becomes more and more necessary as computers become more and more a part of our workplace not just as fancy typewriters or an instantaneous postal system but as a problem solving tool.