Author Archives: Sam Clifford

About Sam Clifford

Statistician, improviser, board gamer.

Career Practitioners’ Day

I had a lot of fun this morning talking to a room full of career counsellors and others in similar roles about what it means to be a modern statistician/scientist working on large, multidisciplinary projects. I talked a little about my experiences as a student, how it took me a while to settle into the field that I did, and showed a few of the cool topics I get to work on. Everyone seemed to want to keep listening, and I even got some feedback later that it was the first time they’d heard a mathematician speak about maths and it be interesting.

If there are some key points that I hope people took away, it’s that maths isn’t just about doing maths but about solving problems. I mentioned my two favourite quotes to emphasise that studying maths, particularly in science, isn’t just about doing calculations by hand,

“Essentially, all models are wrong but some are useful” – George Box (1978)

“Machines can do the work so humans have time to think” – IBM – The Paperwork Explosion (1967)

Continue reading

Finishing off projects

One of the issues with working on a number of multidisciplinary projects at the same time is that stuff always ends up taking longer than expected and it interrupts progress on any given one. That said, the report for the Great Barrier Reef project I’ve been working on has been finalised and accepted, and the paper on modelling jaguar presences and abundances has been finished and is published.

Since I’ve been working on these larger projects I’ve started putting together a site that is an alternative to a CV, a sort of research portfolio that lists the projects I’ve worked on and the papers that have come out of them. I figured that I can’t list all the papers and a description of them in my CV as it’ll blow out to a huge number of pages and be more like a biography. It’s all done in R Markdown knitted to a Tufte-inspired HTML template with a little CSS thrown in to modify the fonts and table of contents. It wasn’t actually that difficult to do, and I learned a bit more about Markdown in the process. The next thing I’d like to be able to do is write a CSL file for styling the bibliography in such a way that some part of the reference itself is the URL, rather than it being tacked on the end, and abbreviate authors’ first names. That way the end half of the page isn’t so cluttered.

I’ve been working with the Teaching and Learning team at QUT’s Science and Engineering Faculty, and discussing with the physics and chemistry academics, on improving the maths in the Bachelor of Science degree. Nothing’s finalised yet in terms of long term planning but we’ve been gradually solving problems over the last few years regarding students’ background maths skills coming into the unit and recommending strategies that will help them get through their degrees. Feedback from the PULSE survey mid-semester indicates that we’re still doing a good job but probably need to rebalance a few topics and give a gentler introduction to R.

Since Nick Tierney came on board in SEB113 and redid the lab worksheets in R Markdown and created videos to show how to work through the exercises, I’ve been gradually introducing more and more R Markdown into the teaching workflow. The pie in the sky idea at the moment is to distribute lecture, lab and workshop material to students as a bookdown document that they can either clone or fork from a GitHub repository and work on. Any changes made to the book can be fetched so that students always have the most up to date version of the notes. The course could even be forked from one semester to another, or the book treated as releases. A number of the tutors in SEB113 are sold on R Markdown and the ability to include R analysis and LaTeX formatting in a set of slides, report or webpage, so there’d definitely be the staff to do it. There are certainly more pressing issues to solve around content and programming in general before we try to push first year science students into using code sharing platforms to download a textbook.

Posterior samples

I’ve been using Twitter for reading and our QUT Maths/Stats Slack domain for discussing maths, stats, data science, etc. over the last few months. So the way I use social media for work has changed a lot, and I’ve not been blogging as often. In any case, I figured there’s enough random stuff I encounter floating around that it could be good to restart the info dump that is Posterior samples.

Continue reading

A new year

It’s been about a year, and a lot’s happened since then. The Diagnostic Quiz has gone from a tool for helping me understand my students better to a tool to help students choose the right pathway through their Science degrees. Now, if a student does poorly on certain sections of the diagnostic, particularly calculus and algebra, we recommend they hold off until SEB113 until second semester and take MZB101 – Introductory Modelling with Calculus – in its place. While I’ve not had a look yet at all the enrolment data, anecdotally a number of students have contacted me about switching out and have appreciated getting the feedback that they will need to cover a bit more mathematics so that they can understand what they need for their degree.

Unfortunately, when a student unenrols from my unit I lose all of their assessment items, which means I don’t have a record of the results for the students who move into MZB101. Perhaps something other than Blackboard (MZB125 – Introductory Engineering Mathematics – use WebWork for their diagnostic) which doesn’t link storage to enrolment as tightly would be a useful way to approach this. I’d love to do some analysis at the end of the year of the end of semester marks for those students who transferred out compared to the marks of those who remained in SEB113 but with low scores on the diagnostic.

With a cohort with better general mathematics skills than before, we’ll be able to spend less time catching up on simple algebra and calculus and more time extending what is covered in high school. I’ve found some nice physics examples for linear algebra (circuits) and differential equations (Torricelli’s law) and will be trying to grab a few more examples that we haven’t used before, particularly for assessment.

There’s a little more movement in our tutorials and workshops towards using packages from the tidyverse for our data munging and analysis. When we started four years ago we were using base graphics, reshape and then reshape2, tapply(), and writing loops with par(mfrow=c(2,2)) style stuff to do small multiples. Since introducing ggplot2 a semester or two later, we’ve been working on making the analysis as coherent as possible so students aren’t having to move between different conceptual models of what data are, how they’re stored and how we operate on them. The use of the %>% pipe is left as a bonus for those who feel comfortable programming, but the rest of the class will still be learning about gather, spread, group_by, summarise, summarise_each, and mutate.

Oh, and I’m giving two two-hour lectures this semester, repeating for different groups within the cohort. It’s weird.

Diagnostics for first year students

The SEB113 teaching team last semester (me, Ruth Luscombe, Iwona Czaplinski, Brett Fyfield) wrote a paper for the HERDSA conference about the relationship between student engagement and success. We collected data on the timing of students’ use of the adaptive release tool we developed, where students confirm that they’ve seen some preparatory material before being given access to the lecture, computer lab and workshop material. We built a regression model that looked at the relationship between the number of weeks of material students gave themselves access to and their end of semester marks (out of 100%), and it showed that students who engaged more obtained better marks, where engagement also included active use of the Facebook group and attendance at workshop classes. I had assumed that we’d be able to get data on students’ maths backgrounds coming in, but with so many ways to enter university, we don’t have the background info on every student. QUT has set Queensland Senior Maths B as the assumed knowledge for SEB113 (and indeed the broader ST01 Bachelor of Science degree) and I’m interested in knowing whether or not the level of maths of students coming in has a bearing on how well they do over the course of the unit.

This semester, we decided that it’d be good to not just get a sense of the students’ educational backgrounds but to assess what their level of mathematical and statistical skills are. We designed a diagnostic to run in the first lecture that would canvas students on their educational background, their attitudes towards mathematics and statistics, and how well they could answer a set of questions that a student passing Senior Maths B would be able to complete. The questions were taken from the PhD thesis of Dr Therese Wilson and research published by Dr Helen MacGillivray (both at QUT), so I’m fairly confident we’re asking the right questions. One thing I really liked about Dr MacGillivray’s diagnostic tool, a multiple choice test designed for engineering students, is that each incorrect choice is wrong for a very specific reason, such as not getting the order of operations right, not recognising something as a difference of squares, etc.

I’m about to get the scanned and processed results back from the library and it turns out that a number of students didn’t put their name or student number on the answer sheet. Some put their names down but didn’t fill in the circles, so the machine that scans the answer sheet won’t be able to determine who the student is and it’ll take some manual data entry probably on my part to ensure that we can get as many students as possible the results of their diagnostic. So while I’ll have a good sense of the class overall, and how we need to support them, it’ll be harder than it should be to ensure that the people who need the help are able to be targetted for such help.

Next semester I’ll try to run the same sort of thing, perhaps with a few modifications. We’ll need to be very clear about entering student numbers and names so that we can get everyone their own results. It’d be good to write a paper that follows on from our HERDSA paper and includes more information about educational background. It might also be interesting to check the relationship between students’ strength in particular topics (e.g. calculus, probability) and their marks on the corresponding items of assessment. Getting it right next semester and running it again in Semester 1 2017 would be a very useful way of gauging whether students who are weak in particular topics struggle to do well on certain pieces of assessment.

 

R Markdown

I’ve been spending a bit of time over the last few days making an R tutorial for the members of my air quality research group. Rather than being a very general introduction to the use of R, e.g. file input/output, loops, making objects, I’ve decided to show a very applied workflow that involves the actual data analysis and explaining ideas as we go along. Part of this philosophy is that I’m not going to write a statistics tutorial, opting instead to point readers to textbooks that deal with first year topics such as regression models and hypothesis tests.

It’s been a very interesting experience, and it’s meant having to deal with challenges along the way such as PDF graphs that take up so much file space for how (un-)important they are to the overall guide and, thinking about how to structure the tutorial so that I can assume zero experience with R but some experience with self-directed learning. The current version can be seen here.

One of the ideas that Sama Low Choy had for SEB113 when she was unit coordinator and lecturer and I was just a tutor, was to write a textbook for the unit because there wasn’t anything that really covered our approach. Since seeing computational stats classes in the USA being hosted as repositories on GitHub I think it might be possible to use R Markdown or GitBook to write an R Markdown project that could be compiled either as a textbook with exercises or as a set of slides.

Blogging about blogging

I was inspired to make a website and start blogging about my work when I went to 8BNP in 2011 and met people like Kevin Canini and Tamara Broderick who had websites to spruik themselves as researchers. I eventually got around to re-setting up my WordPress account, buying a domain and setting up the whole DNS shebang.

The last four years have seen some major changes in the web resources for research, with things like github taking the place of subversion and encouraging a more social and outward facing coding culture. You can blog using github now, and Nick Tierney (a PhD student at QUT) has made me think about whether it’s worth migrating from WordPress to jekyll. Further exposure to R Markdown through Di Cook’s workshop at Bayes on the Beach has strengthened my belief in RStudio not just as a way to do research but to communicate it. This is even before we start considering all the things like shiny and embedded web stuff.

It’ll take some work and I’m not sure I’ll have time over summer, but it’s a change that’s probably worth making.