The SEB113 teaching team last semester (me, Ruth Luscombe, Iwona Czaplinski, Brett Fyfield) wrote a paper for the HERDSA conference about the relationship between student engagement and success. We collected data on the timing of students’ use of the adaptive release tool we developed, where students confirm that they’ve seen some preparatory material before being given access to the lecture, computer lab and workshop material. We built a regression model that looked at the relationship between the number of weeks of material students gave themselves access to and their end of semester marks (out of 100%), and it showed that students who engaged more obtained better marks, where engagement also included active use of the Facebook group and attendance at workshop classes. I had assumed that we’d be able to get data on students’ maths backgrounds coming in, but with so many ways to enter university, we don’t have the background info on every student. QUT has set Queensland Senior Maths B as the assumed knowledge for SEB113 (and indeed the broader ST01 Bachelor of Science degree) and I’m interested in knowing whether or not the level of maths of students coming in has a bearing on how well they do over the course of the unit.
This semester, we decided that it’d be good to not just get a sense of the students’ educational backgrounds but to assess what their level of mathematical and statistical skills are. We designed a diagnostic to run in the first lecture that would canvas students on their educational background, their attitudes towards mathematics and statistics, and how well they could answer a set of questions that a student passing Senior Maths B would be able to complete. The questions were taken from the PhD thesis of Dr Therese Wilson and research published by Dr Helen MacGillivray (both at QUT), so I’m fairly confident we’re asking the right questions. One thing I really liked about Dr MacGillivray’s diagnostic tool, a multiple choice test designed for engineering students, is that each incorrect choice is wrong for a very specific reason, such as not getting the order of operations right, not recognising something as a difference of squares, etc.
I’m about to get the scanned and processed results back from the library and it turns out that a number of students didn’t put their name or student number on the answer sheet. Some put their names down but didn’t fill in the circles, so the machine that scans the answer sheet won’t be able to determine who the student is and it’ll take some manual data entry probably on my part to ensure that we can get as many students as possible the results of their diagnostic. So while I’ll have a good sense of the class overall, and how we need to support them, it’ll be harder than it should be to ensure that the people who need the help are able to be targetted for such help.
Next semester I’ll try to run the same sort of thing, perhaps with a few modifications. We’ll need to be very clear about entering student numbers and names so that we can get everyone their own results. It’d be good to write a paper that follows on from our HERDSA paper and includes more information about educational background. It might also be interesting to check the relationship between students’ strength in particular topics (e.g. calculus, probability) and their marks on the corresponding items of assessment. Getting it right next semester and running it again in Semester 1 2017 would be a very useful way of gauging whether students who are weak in particular topics struggle to do well on certain pieces of assessment.
One of the standard population dynamics models that I learned in my undergrad mathematical modelling units was the Lotka-Volterra equations. These represent a very simple set of assumptions about populations, and while they don’t necessarily give physically realistic population trajectories they’re an interesting introduction to the idea that differential equations systems don’t necessarily have an explicit solution.
The assumptions are essentially: prey grow exponentially in the absence of predators, predation happens at a rate proportional to the product of the predator and prey populations, birth of predators is dependent on the product of predator and prey populations, predators die off exponentially in the absence of prey. In SEB113 we cover non-linear regressions, the mathematical models that lead to them, and then show that mathematical models don’t always yield a nice function. We look at equilibrium solutions and then show that we orbit around it rather than tending towards (or away from) it. We also look at what happens to the trajectories as we change the relative size of the rate parameters.
Last time we did the topic, I posted about using the logistic growth model for our Problem Solving Task and it was pointed out to me that the model has a closed form solution, so we don’t explicitly need to use a numerical solution method. This time around I’ve been playing with using Euler’s method inside JAGS to fit the Lotka-Volterra system to some simulated data from sinusoidal functions (with the same period). I’ve put a bit more effort into the predictive side of the model, though. After obtaining posterior distributions for the parameters (and initial values) I generate simulations with lsode in R, where the parameter values are sampled from the posteriors. The figure below shows the median and 95% CI for the posterior predictive populations as well as points showing the simulated data.
The predictions get more variable as time goes on, as the uncertainty in the parameter values changes the period of the cycles that the Lotka-Volterra system exhibits. This reminds me of a chat I was having with a statistics PhD student earlier this week about sensitivity of models to data. The student’s context is clustering of data using overfitted mixtures, but I ended up digressing and talking about Edward Lorenz’s discovery of chaos theory through a meteorological model that was very sensitive to small changes in parameter values. The variability in the parameter values in the posterior give rise to the same behaviour, as both Lorenz’s work and my little example in JAGS involve variation in input values for deterministic modelling. Mine was deliberate, though, so isn’t as exciting or groundbreaking a discovery as Lorenz but we both come to the same conclusion: forecasting is of limited use when your model is sensitive to small variations in parameters. As time goes on, my credible intervals will likely end up being centred on the equilibrium solution and the uncertainty in the period of the solution (due to changing coefficient ratios) will result in very wide credible intervals.
It’s been a fun little experiment again, and I’m getting more and more interested in combining statistics and differential equations, as it’s a blend of pretty much all of my prior study. The next step would be to use something like MATLAB with a custom Gibbs/Metropolis-Hastings scheme to bring in more of the computational mathematics I took. It’d be interesting to see if there’s space for this sort of modelling in the Mathematical Sciences School’s teaching programs as it combines some topics that aren’t typically taught together. I’ve heard murmurings of further computational statistics classes but haven’t been involved with any planning.
There is no textbook for SEB113 – Quantitative Methods in Science.
It’s not that we haven’t bothered to prescribe one, it’s that no one seemed to be taking the same approach we decided upon two years ago when the planning for the unit started. There are books on statistics for chemistry, statistics for ecology, statistics for physics, statistics for mathematics, etc. but trying to find a general “statistics for science” book that focuses on modelling rather than testing has been difficult.
That said, there are some amazing resources out there if you know where to look, not just for learning statistics but for teaching statistics. One of the most useful that we’ve come across is “Teaching Statistics“, by Andrew Gelman and Deborah Nolan. The book itself is full of advice for things like groupwork, topic order, structure of learning activities, etc. but my favourite thing so far is the paper helicopter experiment. Continue reading →
Last week I gave my first lecture for the semester to the SEB113 students. While they tend to not have a particularly strong mathematics background I got some very positive feedback on how much they enjoyed learning about mathematical modelling. We revised differentiation, what derivatives are and then jumped into a bit about formulating differential equations from words that represent the assumptions that the model makes.
The bulk of that week’s lecture is showing where the non-linear regression models we used in the previous week (first order compartment, asymptotic, biexponential) come from. To do this we have a chat about exponential growth and decay models as some of the easiest differential equation models to deal with. I show them how we solve the exponential model exactly and then make reference to the fact that I don’t expect them to solve these equations in this subject. We show the solutions to the DE systems and make it very clear that the non-linear regression models are the solutions to differential equations that represent different assumptions.
We finish the lecture off with a section on how we can’t always get a “pen and paper” solution to differential equations and so sometimes we either simplify the system to one we can solve (alluding to perturbation methods) or give it to a numerical solver (alluding to computational mathematics). Because it’s how I learned about numerical solutions to DEs I showed the students the Lotka-Volterra model and discussed why we can’t solve X(t) and Y(t) and so have to use numerical methods. For different parameter values we get variations on the same behaviour: cyclic patterns, prey population growth followed by predator population growth followed by overconsumption of prey leading to fewer predators being born to replace the dying. Many students seemed to enjoy investigating this model in the workshops, as it’s quite different to everything we’ve learned so far. Solution is via the deSolve package in R but we introduce the students to Euler’s method and discuss numerical instability and the accumulation of numerical error.
I finish off the lecture with a chat about how regression tends to make assumptions about the form of the mean relationship between variables so we can do parameter estimation and that differential equations give us a system we can solve to obtain that mean relationship. I state that while we can solve the DE numerically while simultaneously estimating the parameter it is way outside the scope of the course.
I had a bit of time this morning to spend on next week’s lecture material (linear algebra) so decided to have a go at numerical estimation for the logistic growth model and some data based on the Orange tree circumference data set in R with JAGS/rjags. It’s the first time I’ve had a go at combining regression and numerical solutions to DEs in the same code, so I’ve only used Euler’s method. That said, I was very happy with the solution and the code is provided below the cut.
“R: The Good Parts” is an attempt to showcase the best way to do things in R. I’m not yet at the stage of dealing with absolutely massive data sets but things will be heading that way for me if aerosol samplers continue to measure at higher frequencies. Left out of the article is a discussion of dplyr; I’m still using functions from the apply family! Maybe I should also get used to using data.table. (Update: I’m now using data.table and its syntax to apply functions across grouping levels that I’ve set as keys. This is amazing).
While we’ve been incorporating a few of the mathematical needs of SEB114 into SEB113 it looks like we may need to go a bit further with incorporating the R needs. I hadn’t really thought about plotting a specific function (other than a line y = ax + b) in the workshops but it looks like a few earth sciences students need to plot the function π x / (1+x)2. So we’ll have to take stock over the next six months of what the experimental science lecturers want to put in their units and how we can help support that (also how we can get the science lecturers to help reinforce statistical modelling over statistical testing).
Today’s the final day of EMAC2013, starting with Joe Monaghan’s talk on numerical methods for the dynamics of fluids that contain particles (in 20 minutes, so I’ll be brief).
I attended yesterday’s “Education” session and saw some interesting things about how maths education is going around the country. The University of Tasmania is engaging with TasTAFE to deliver maths courses to engineering diploma students in order to prepare them for the mathematics they’ll encounter in their bachelor’s degrees. UTS is doing some interesting analysis of their maths course results to rejig the prerequisite pathways for their maths courses. A particularly interesting case was the use of a first year linear algebra course as a predictor of performance in a second year stochastic models subject that previously only had a first year probability course as its prerequisite.
I chaired and presented in yesterday’s “Environment” session, presenting the mathematics behind the personal sampling that we’ve been working on with the UPTECH project. I got quite a number of good questions and was overall quite happy with the talk I gave. The other talks in the session were about: using approximations to a sum of Pareto distributions, developed by actuaries, to determine whether extreme values in biomass luminescence were real or artifacts from the new sensors; and incorporating insolation into global climate models.
Josef Barnes (Griffith) won the student prize (for, I assume, his talk on cardiac geometries), with honorable mentions for Kristen Harley and Lisa Mayo (QUT) and Laith Hermez. Bill Blyth, for whom the prize is named, pointed out the quality of the student talks at EMAC2013 is getting higher year after year. This can only be good news for the applied mathematics sector in Australia (and New Zealand) as these students will likely go on to academic positions and generate high quality research.
David Lovell gave a great talk yesterday about multi-, inter-, trans- and ante-disciplinary research. I’m reading the article he referred to yesterday about the way disciplines will have to deal with each other and knowledge sharing over the coming years.
Thiago Martins has posted a neat little tutorial about using R to calculate and visualise Principle Components Analysis, using Fisher’s Iris data. PCA is something I’ve struggled with as I’ve gone further into statistics, as it comes across as being based on mathematics rather than statistics. I’d like to learn more about the Indian Buffet Process and associated non-parametric Bayesian methods but if I’m going to be looking at long and wide data sets (say, UPTECH questionnaire data) I’d like to have somewhere to start. It looks like this may provide that.
Rasmus Bååth’s done a tutorial on Laplace Approximations in R (hat tip to Matt Moores for this one). Laplace Approximations are an alternative to MCMC simulation that can provide good approximations to well-behaved posterior densities in a fraction of the time. The tutorial deals with the issue of reparameterisation for when you’ve got parameters which have bounded values (such as binomial proportions). As a piece of trivia, Thiago (above) is based at NTNU where R-INLA is developed.
I’m at the emac2013 conference this week. We’re about half way through day one of the talks (of three) and there’s already been some fascinating stuff. Professor Robert Mahony (ANU) gave a talk that shows that the development of more advanced unmanned aerial vehicles (UAVs, drones) involves some quite complex but elegant mathematics, involving Lie group symmetries, rather than just coming up with cooler robots. Hasitha Nayanajith Polwaththe Gallage (QUT) showed some really interesting particle method (mesh-free) modelling where forces and energies were used to determine the shape of a red blood cell that had just ejected its nucleus.