Tag Archives: education

Diagnostics for first year students

The SEB113 teaching team last semester (me, Ruth Luscombe, Iwona Czaplinski, Brett Fyfield) wrote a paper for the HERDSA conference about the relationship between student engagement and success. We collected data on the timing of students’ use of the adaptive release tool we developed, where students confirm that they’ve seen some preparatory material before being given access to the lecture, computer lab and workshop material. We built a regression model that looked at the relationship between the number of weeks of material students gave themselves access to and their end of semester marks (out of 100%), and it showed that students who engaged more obtained better marks, where engagement also included active use of the Facebook group and attendance at workshop classes. I had assumed that we’d be able to get data on students’ maths backgrounds coming in, but with so many ways to enter university, we don’t have the background info on every student. QUT has set Queensland Senior Maths B as the assumed knowledge for SEB113 (and indeed the broader ST01 Bachelor of Science degree) and I’m interested in knowing whether or not the level of maths of students coming in has a bearing on how well they do over the course of the unit.

This semester, we decided that it’d be good to not just get a sense of the students’ educational backgrounds but to assess what their level of mathematical and statistical skills are. We designed a diagnostic to run in the first lecture that would canvas students on their educational background, their attitudes towards mathematics and statistics, and how well they could answer a set of questions that a student passing Senior Maths B would be able to complete. The questions were taken from the PhD thesis of Dr Therese Wilson and research published by Dr Helen MacGillivray (both at QUT), so I’m fairly confident we’re asking the right questions. One thing I really liked about Dr MacGillivray’s diagnostic tool, a multiple choice test designed for engineering students, is that each incorrect choice is wrong for a very specific reason, such as not getting the order of operations right, not recognising something as a difference of squares, etc.

I’m about to get the scanned and processed results back from the library and it turns out that a number of students didn’t put their name or student number on the answer sheet. Some put their names down but didn’t fill in the circles, so the machine that scans the answer sheet won’t be able to determine who the student is and it’ll take some manual data entry probably on my part to ensure that we can get as many students as possible the results of their diagnostic. So while I’ll have a good sense of the class overall, and how we need to support them, it’ll be harder than it should be to ensure that the people who need the help are able to be targetted for such help.

Next semester I’ll try to run the same sort of thing, perhaps with a few modifications. We’ll need to be very clear about entering student numbers and names so that we can get everyone their own results. It’d be good to write a paper that follows on from our HERDSA paper and includes more information about educational background. It might also be interesting to check the relationship between students’ strength in particular topics (e.g. calculus, probability) and their marks on the corresponding items of assessment. Getting it right next semester and running it again in Semester 1 2017 would be a very useful way of gauging whether students who are weak in particular topics struggle to do well on certain pieces of assessment.



ALP wants to teach kids how to program, and I agree

I checked in on one of my workshop classes this morning to see how everyone was going in the final week, to remind them of the remaining help sessions and to check that they’re on track to complete their group assignments.

There weren’t many students in the class, what with it being week 13, but of one of the students was very proud of the fact that she’d lifted her marks on the problem solving tasks from 1/10 to 8/10 over the course of the semester. She told me that going back over the last few workshops helped reinforce the coding that she needed to be able to do in order to complete the assessment.

She plans on transferring into medicine, which is typically not a career that requires programming. At the end of the semester, with only one piece of assessment remaining and the decision made that she will change out of science, she is still putting a lot of effort into understanding the statistics and learning how to program is reinforcing this and allowing her to engage deeper than if we were restricted to the stats education I had in first year ten years ago where we spent a lot of time looking up the tails of distributions in a book of tables.

Maths and statistics education (for students not studying maths/stats as a major) is no longer just about teaching students how to do long division in high school and calculus and point and click statistics methods at university. While some degree such as Electrical Engineering, Computer Science and IT have traditionally been associated with some amount of programming, it’s becoming more and more common for maths and stats service units to include MATLAB or R as a means of engaging deeper with the mathematical content and understanding solutions to linear systems and differential equations or performing data analysis and visualisation. Learning to program leads to better understanding of what you’re actually doing with the code.

Computers are everywhere in our students’ lives and in their educational experiences. Due to their ubiquity, the relationship students have with computers is very different to what it was 10 years ago. Computers are great at enabling access to knowledge through library databases, Wikipedia and a bunch of other online repositories. But it’s not enough to be able to look up the answer, one also has to be able to calculate an answer when it hasn’t been determined by someone else. There is not yet a mathematics or statistics package that does all of the data analysis and all of the mathematical analysis that we might want to do in a classroom with a point and click, drag and drop interface.

To this end, I teach my students how to use R to solve a problem. Computers can do nearly anything, but we have to be able to tell the computer how to do it. Learning simple coding skills in school prepares students to tackle more advanced coding in quantitative units in their university studies but it also teaches an understanding of how processes work based on inputs and outputs, and not just computational processes, it’s all about a literacy of processes and functions (inputs and outputs). Learning to code isn’t just about writing code as a profession no more than teaching students to read is done to prepare them in their profession of priest or newsreader. Coding provides another set of skills that are relevant to the future of learning and participation in society and the workforce, just as learning mathematics allows people to understand things like bank loans.

Tony Abbott does not sound like he’s on board with the idea of giving kids the skills to get along in a world in which computers are part of our classroom the way books were when he was going through school. While reading, writing and basic mathematics skills will continue to be important skills, literacy is more than just reading comprehension. Information literacy, being able to handle data, and being able to reason out a process are even more important thanks to the changing technologies we are experiencing. Not every student is going to be a professional programmer, an app developer or big data analyst, but coding will be a skill which becomes more and more necessary as computers become more and more a part of our workplace not just as fancy typewriters or an instantaneous postal system but as a problem solving tool.

That feeling when former students contact you

Last year I had a student in SEB113 who came in to the subject with a distaste for mathematics and statistics; they struggled with both the statistical concepts and the use of R throughout the semester and looked as though they would rather be anywhere else during the collaborative workshops. This student made it to every lecture and workshop though and came to enjoy the work of using R for statistical analysis of data; and earned a 7 in the unit.

I just got an email from them asking for a reference for their VRES (Vacation Research Experience Scheme) project application. Not only am I proud of this student for working their butt off to get a 7 in a subject they disliked but came to find interesting, but I am over the moon to hear that they are interested in undertaking scientific field research. This student mentions how my “passion for teaching completely transformed my (their) view of statistics”, and their passion for the research topic is reflected in the email.

This sort of stuff is probably the most rewarding aspect of lecturing.

Posterior samples

ARC Discovery Projects have been returned to their authors, and we are putting our responses together for the rejoinders. Interesting to see that we got a comment suggesting that we use the less restrictive CC-by instead of CC-by-nc-sa as we’d suggested. We weren’t successful in our Linkage Project applications, which is disappointing as they were interesting projects (well, we thought so). Continuing to bring research funding in is an ongoing struggle for all research groups and I feel it’s only going to get harder as the new federal government’s research priorities appear to be more aligned to medical science that delivers treatments than to our group’s traditional strengths.

SEB113 is pretty much completely over for the semester, with marks having been entered for almost every student. Overall I think the students did fairly well. We had some issues with the timetable this semester. Ideally, we’d like the Lecture, then all of the computer labs, then all of the workshops, so that we can introduce a statistical idea, show the code and then apply the idea and code in a group setting. Next semester, we have the lecture followed immediately by the workshops with the computer labs dotted throughout the remainder of the week. This has provided us with an opportunity to try some semi-flipped classroom ideas, where students are able/expected to do the computer lab at home at their own pace rather than watch a tutor explain it one line at a time at the front of a computer lab.

I’m teaching part of a two day course on the use of R in air pollution epidemiology. My part will introduce Bayesian statistics with a brief overview, a discussion about prior distributions as a means of encoding a priori beliefs about model parameters, and discuss the use of Bayesian hierarchical modelling (as opposed to more traditional ANOVA techniques) as a way of making the most of the data that’s been collected. The other two presenters are Dr Peter Baker and Dr Yuming Guo. The course is being run by the CAR-CRE, who partially fund my postdoctoral fellowship.

I had meant to post this back when they were doing the rounds, but there’s a bunch of plots that attempt to show that correlation isn’t causation and that spurious correlations exist in large data sets. Tom Christie has responded to this by going over the fact that correlation in time series isn’t as simple as in the case of independent, identically distributed data. One should be careful that one’s criticism of bad statistics is itself founded on good statistics.

Posterior samples

A rough guide to spotting bad science.

Why big data is in trouble: they forgot about applied statistics. Big data analytics are all well and good but you have to keep in mind that there are statistical properties that govern which inferences are valid.

While I’m comfortable giving a lecture I really struggled to get through them in undergrad. It turns out they may not be the most effective way to get information to students.

My supervisors, Professor Lidia Morawska, is giving a public talk (free to register) at QUT soon, “Air Quality Reports On Our Mobiles – Do We Care?” June 6 2014

The ongoing crusade against Excel-based analysis

One of the things I catch myself saying quite often in SEB113 is “This is new. It’s hard. But remember, you weren’t born knowing how to walk. You learned it”, as my way of saying that it’s okay to not understand this straight away, it takes time, practice and determination. I often say this in response to students complaining about learning R to do their data analysis. It’s actually got to the point where the unit co-ordinator suggested I get a t-shirt printed with “You weren’t born knowing how to walk” on the front and “So learn R” on the back.

One of the reasons I’m so keen to push new students into learning R is that while Excel can do some of the simpler calculations required in the first year of a science degree it is often completely inadequate for doing data analysis as a professional scientist, or even in an advanced level university course. I actually saw a senior researcher in a 3 day Bayesian statistics course try to avoid using R to code a Gibbs sampler by getting it up and running in Excel. They managed it, but it took minutes to run what the rest of us could compute in a second (and it was for a trivially simple problem).

There are problems with Excel, such as its inability to deal with the standard deviation of a group of very large numbers due to its bizarre formulation. Apparently the secret to sane use of Excel is to only use it for data storage. This guiding principle has meant that I no longer manipulate my data in Excel. Even with time stamp information I’ll fire up the lubridate package to convert from one format to another. I’m slowly exploring the Hadleyverse and that sort of approach is filtering through into SEB113 where we’re teaching the use of ggplot2 and reshape2 within RStudio. These are all powerful tools that simplify data analysis and avoid the hackish feel that much Excel-based analysis has, where pivot tables are a thing and graphs are made by clicking and dragging a selection tool down the data (which can lead to some nasty errors).

The fact that these powerful tools that make data analysis simple are free is another reason to choose R over Excel. I’m not on the “Open Source Software and provision of all code is mandatory” bandwagon as others seem to be when it comes to analysis being replicable. I agree it’s a worthwhile goal but it’s not a priority for me. That said, though, I definitely support encouraging the use of free software (in both senses) in education on the grounds of equity of access.

I had a chat with some students in SEB113 yesterday about why we’re teaching everything in R given that the SEB114 staff use a combination of Excel, MATLAB (and maybe even other packages I don’t know about). If we were to teach analysis the way that the SEB114 lecturers do it themselves, we’d have to teach multiple packages to multiple disciplines. Even discounting the fact that everything we teach is implemented in R, that R is free (unlike Excel and MATLAB), cross-platform (Excel on Linux? Try OpenOffice/OfficeLibre) and extensible (MATLAB has toolboxes, Excel has add-ins, R has a nice package manager) was a big plus for students who said that being able to work on assignments at home was valuable and so paying for software would make study difficult.

Convincing students to use R can be difficult, especially if they have no programming background, but ultimately they seem to accept that R is powerful, can do more than Excel and that writing reusable code makes future analysis easier. Convincing SEB114 academics that teaching their students to use R is a good idea is probably a harder sell, given that they’ve got years of experience with other tools. It’s still only semester 3 of the new Bachelor of Science course so we’ll have to see how this plays out over the years to come.

Timetabling and the potential for alternative delivery in SEB113

I’ve been pretty busy writing the analysis plan for the main paper from the UPTECH project and reorganising SEB113 workshops. We’ve had some meetings recently with QUT timetabling people which has led to discussions about how we try to get students to enrol in a sensible pair of workshops and labs for both SEB113 and SEB114.

One of the biggest concerns when it comes to these paired subjects is making sure that people attend the labs and workshops in the right order and are working with the same groups across both subjects so that we can structure the teaching material. In SEB113 the preferred order of classes is Lectorial, Computer Lab, Collaborative Workshop. The lecture introduces the topic, the lab shows you how it’s implemented in R and the workshop gets you working in a group with others to solve a problem based on the topic.

The problem comes about with QUT’s timetabling software providing a timetable which contains no clashes for the core first year subjects (SEBs 101, 102, 113, 114). Timetabling the lectures/lectorials for these units so that they don’t clash is a task in and of itself and I’m impressed that the timetabling people have managed to make sure these subjects don’t clash (I remember taking two units for the applied physics co-major in the old B App Sc course where the lectures clashed). The non-clashing timetable doesn’t necessarily mean students can enrol in the class order that we would prefer. It’s also unlikely that we can automatically combine a lab-workshop pair as one thing to be enrolled in and it’s impractical to try to get a staff member to enrol students manually.

It’s got me thinking a lot about flipped classrooms and other ways of overcoming the timetable difficulty. The benefit of the workshop for students is that they have a group to work with on a big task and they have two tutors to ask for help when they get stuck. I feel like this would be difficult to do outside a classroom without some sort of help-desk queueing system that is only open between certain times (and then you’ve still got the time restrictions). The computer labs can be done individually at any time, though, as they’re about exposure to code rather than solving a particular problem. In this instance, we could probably cut down on the number of computer labs required by encouraging students to do the lab in their own time before their workshop, which is in the spirit of flipped classrooms.

The last labs are in week 7 (this week!) which means it’s not going to be an issue much longer this semester. Semester 2 has fewer SEB113 enrolments (SEB114 isn’t offered) so it’s not going to be as big an issue then. Whether we go with changing the timetabling system or we modify computer labs to become programming consults (where to get help you must have attempted the lab) is something we can deal with a bit later. With the use of Echo360 being made mandatory in all lectures at QUT the availability of recorded lectures makes it easier for students to go through the material at their own pace. With so many students in the subject, there’s a large number of person hours which go into content delivery. I’m not sure we’re using that resource (labour) as effectively as we can, and changing the way we deliver the subject may help that.