Posterior samples

I’ve been using Twitter for reading and our QUT Maths/Stats Slack domain for discussing maths, stats, data science, etc. over the last few months. So the way I use social media for work has changed a lot, and I’ve not been blogging as often. In any case, I figured there’s enough random stuff I encounter floating around that it could be good to restart the info dump that is Posterior samples.

Continue reading

A new year

It’s been about a year, and a lot’s happened since then. The Diagnostic Quiz has gone from a tool for helping me understand my students better to a tool to help students choose the right pathway through their Science degrees. Now, if a student does poorly on certain sections of the diagnostic, particularly calculus and algebra, we recommend they hold off until SEB113 until second semester and take MZB101 – Introductory Modelling with Calculus – in its place. While I’ve not had a look yet at all the enrolment data, anecdotally a number of students have contacted me about switching out and have appreciated getting the feedback that they will need to cover a bit more mathematics so that they can understand what they need for their degree.

Unfortunately, when a student unenrols from my unit I lose all of their assessment items, which means I don’t have a record of the results for the students who move into MZB101. Perhaps something other than Blackboard (MZB125 – Introductory Engineering Mathematics – use WebWork for their diagnostic) which doesn’t link storage to enrolment as tightly would be a useful way to approach this. I’d love to do some analysis at the end of the year of the end of semester marks for those students who transferred out compared to the marks of those who remained in SEB113 but with low scores on the diagnostic.

With a cohort with better general mathematics skills than before, we’ll be able to spend less time catching up on simple algebra and calculus and more time extending what is covered in high school. I’ve found some nice physics examples for linear algebra (circuits) and differential equations (Torricelli’s law) and will be trying to grab a few more examples that we haven’t used before, particularly for assessment.

There’s a little more movement in our tutorials and workshops towards using packages from the tidyverse for our data munging and analysis. When we started four years ago we were using base graphics, reshape and then reshape2, tapply(), and writing loops with par(mfrow=c(2,2)) style stuff to do small multiples. Since introducing ggplot2 a semester or two later, we’ve been working on making the analysis as coherent as possible so students aren’t having to move between different conceptual models of what data are, how they’re stored and how we operate on them. The use of the %>% pipe is left as a bonus for those who feel comfortable programming, but the rest of the class will still be learning about gather, spread, group_by, summarise, summarise_each, and mutate.

Oh, and I’m giving two two-hour lectures this semester, repeating for different groups within the cohort. It’s weird.

Diagnostics for first year students

The SEB113 teaching team last semester (me, Ruth Luscombe, Iwona Czaplinski, Brett Fyfield) wrote a paper for the HERDSA conference about the relationship between student engagement and success. We collected data on the timing of students’ use of the adaptive release tool we developed, where students confirm that they’ve seen some preparatory material before being given access to the lecture, computer lab and workshop material. We built a regression model that looked at the relationship between the number of weeks of material students gave themselves access to and their end of semester marks (out of 100%), and it showed that students who engaged more obtained better marks, where engagement also included active use of the Facebook group and attendance at workshop classes. I had assumed that we’d be able to get data on students’ maths backgrounds coming in, but with so many ways to enter university, we don’t have the background info on every student. QUT has set Queensland Senior Maths B as the assumed knowledge for SEB113 (and indeed the broader ST01 Bachelor of Science degree) and I’m interested in knowing whether or not the level of maths of students coming in has a bearing on how well they do over the course of the unit.

This semester, we decided that it’d be good to not just get a sense of the students’ educational backgrounds but to assess what their level of mathematical and statistical skills are. We designed a diagnostic to run in the first lecture that would canvas students on their educational background, their attitudes towards mathematics and statistics, and how well they could answer a set of questions that a student passing Senior Maths B would be able to complete. The questions were taken from the PhD thesis of Dr Therese Wilson and research published by Dr Helen MacGillivray (both at QUT), so I’m fairly confident we’re asking the right questions. One thing I really liked about Dr MacGillivray’s diagnostic tool, a multiple choice test designed for engineering students, is that each incorrect choice is wrong for a very specific reason, such as not getting the order of operations right, not recognising something as a difference of squares, etc.

I’m about to get the scanned and processed results back from the library and it turns out that a number of students didn’t put their name or student number on the answer sheet. Some put their names down but didn’t fill in the circles, so the machine that scans the answer sheet won’t be able to determine who the student is and it’ll take some manual data entry probably on my part to ensure that we can get as many students as possible the results of their diagnostic. So while I’ll have a good sense of the class overall, and how we need to support them, it’ll be harder than it should be to ensure that the people who need the help are able to be targetted for such help.

Next semester I’ll try to run the same sort of thing, perhaps with a few modifications. We’ll need to be very clear about entering student numbers and names so that we can get everyone their own results. It’d be good to write a paper that follows on from our HERDSA paper and includes more information about educational background. It might also be interesting to check the relationship between students’ strength in particular topics (e.g. calculus, probability) and their marks on the corresponding items of assessment. Getting it right next semester and running it again in Semester 1 2017 would be a very useful way of gauging whether students who are weak in particular topics struggle to do well on certain pieces of assessment.


R Markdown

I’ve been spending a bit of time over the last few days making an R tutorial for the members of my air quality research group. Rather than being a very general introduction to the use of R, e.g. file input/output, loops, making objects, I’ve decided to show a very applied workflow that involves the actual data analysis and explaining ideas as we go along. Part of this philosophy is that I’m not going to write a statistics tutorial, opting instead to point readers to textbooks that deal with first year topics such as regression models and hypothesis tests.

It’s been a very interesting experience, and it’s meant having to deal with challenges along the way such as PDF graphs that take up so much file space for how (un-)important they are to the overall guide and, thinking about how to structure the tutorial so that I can assume zero experience with R but some experience with self-directed learning. The current version can be seen here.

One of the ideas that Sama Low Choy had for SEB113 when she was unit coordinator and lecturer and I was just a tutor, was to write a textbook for the unit because there wasn’t anything that really covered our approach. Since seeing computational stats classes in the USA being hosted as repositories on GitHub I think it might be possible to use R Markdown or GitBook to write an R Markdown project that could be compiled either as a textbook with exercises or as a set of slides.

Blogging about blogging

I was inspired to make a website and start blogging about my work when I went to 8BNP in 2011 and met people like Kevin Canini and Tamara Broderick who had websites to spruik themselves as researchers. I eventually got around to re-setting up my WordPress account, buying a domain and setting up the whole DNS shebang.

The last four years have seen some major changes in the web resources for research, with things like github taking the place of subversion and encouraging a more social and outward facing coding culture. You can blog using github now, and Nick Tierney (a PhD student at QUT) has made me think about whether it’s worth migrating from WordPress to jekyll. Further exposure to R Markdown through Di Cook’s workshop at Bayes on the Beach has strengthened my belief in RStudio not just as a way to do research but to communicate it. This is even before we start considering all the things like shiny and embedded web stuff.

It’ll take some work and I’m not sure I’ll have time over summer, but it’s a change that’s probably worth making.

Two big pieces of news

I’ve just signed an acceptance of offer of employment which will take me fully back into maths at QUT, 50% teaching in the Mathematical Sciences School and 50% researching with Kerrie Mengersen under her ARC Laureate Fellowship. Over the last few years I’ve been supported variously by Professor Lidia Morawska in the International Laboratory for Air Quality and Health, the NHMRC Centre of Research Excellence for Air quality and health Research and evaluation, QUT’s Institute for Future Environments and Mathematical Sciences School to whom I’m very grateful.

The second piece of big news is that with Ruth Luscombe and Nick Tierney, SEB113 has been recognised with a Vice-Chancellor’s Performance Award for innovation in teaching. We’ve put a lot of work into the unit this year, along with Iwona Czaplinski, Brett Fyfield, Jocelyne Bouzaid and Amy Stringer and the guidance of Ian Turner and Steve Stern. Ruth, Iwona, Brett and I have a paper accepted as part of an education conference next year and it’s a nice confirmation of all that we’ve done over the last 3 years (from Sama Low Choy’s first delivery when I was just a tutor) to take the unit from a grab bag of topics that students didn’t feel was particularly well connected to a coherent series of lecture-lab-workshop sequences that introduce and reinforce six weeks of each of mathematics and statistics topics that students tell us have helped them come to understand the role of quantitative analysis in science.


I had a very full week last week, with the annual Bayes on the Beach (BOB) at the Gold Coast (Mon-Wed) and Bayesian Optimal Design of Experiments  (BODE) on Friday.

BOB is an annual workshop/retreat, run by Kerrie Mengersen and the BRAG group at QUT, that brings together a bunch of Australian and international statisticians for a few days of workshops, tutorials, presentations and fun in the sun. This year was, I think, my fourth year at BOB.

One of the recurring features is the workshop sessions, where around three researchers each pose a problem to the group and everyone decides which one they’re going to work on. This year I was asked to present a problem based on the air quality research I do and so my little group worked on the issue of how to build a predictive model of indoor PM10 based on meteorology, outdoor PM10 and temporal information. We were fortunate to have Di Cook in our group, who did a lot of interesting visual analysis of the data (she later presented a tutorial on how to use ggplot and R Markdown). We ended up discussing why tree models may not be such a great idea, the difference in autocorrelation and the usefulness of distributed lag models. It gave me a lot to think about and I hope that everyone found it as valuable as I did.

The two other workshop groups worked on ranking the papers of Professor Richard Boys (one of the keynote speakers) and building a Bayesian Network model of PhD completion time. Both groups were better attended than mine, which I put down to the idea that those two were “fun” workshops and mine sounded a lot like work. Still, a diverse range of workshops means something for everyone.

James McGree (QUT) asked me if I could come to the BODE workshop to discuss some open challenges in air quality research with regards to experimental design. I gave a brief overview of regulatory monitoring, the UPTECH project’s random spatial selection and then brought in the idea that the introduction of low cost sensors gives us the opportunity to measure in so many places at once but we still need to sort out where we want to measure if we want to characterise human exposure to air pollution. While it was a small group I did get to have a good chat with the attendees about some possible ways forward. It was also good to see Julian Caley (AIMS) talk about monitoring on the Great Barrier Reef, Professor Tony Pettitt (QUT) talk about sampling for intractable likelihoods and Tristan Perez (QUT) discuss the interplay between experimental design and the use of robots.

It’s been a great end to the year to spend it in the company of statisticians working on all sorts of interesting problems. While I do enjoy my air quality work and R usage is increasing at ILAQH it’s an entirely different culture to being around people who spend their time working out whether they’re better off with data.table and reshape2 or dplyr and tidyr.