Combining differential equations and regression

Last week I gave my first lecture for the semester to the SEB113 students. While they tend to not have a particularly strong mathematics background I got some very positive feedback on how much they enjoyed learning about mathematical modelling. We revised differentiation, what derivatives are and then jumped into a bit about formulating differential equations from words that represent the assumptions that the model makes.

The bulk of that week’s lecture is showing where the non-linear regression models we used in the previous week (first order compartment, asymptotic, biexponential) come from. To do this we have a chat about exponential growth and decay models as some of the easiest differential equation models to deal with. I show them how we solve the exponential model exactly and then make reference to the fact that I don’t expect them to solve these equations in this subject. We show the solutions to the DE systems and make it very clear that the non-linear regression models are the solutions to differential equations that represent different assumptions.

We finish the lecture off with a section on how we can’t always get a “pen and paper” solution to differential equations and so sometimes we either simplify the system to one we can solve (alluding to perturbation methods) or give it to a numerical solver (alluding to computational mathematics). Because it’s how I learned about numerical solutions to DEs I showed the students the Lotka-Volterra model and discussed why we can’t solve X(t) and Y(t) and so have to use numerical methods. For different parameter values we get variations on the same behaviour: cyclic patterns, prey population growth followed by predator population growth followed by overconsumption of prey leading to fewer predators being born to replace the dying. Many students seemed to enjoy investigating this model in the workshops, as it’s quite different to everything we’ve learned so far. Solution is via the deSolve package in R but we introduce the students to Euler’s method and discuss numerical instability and the accumulation of numerical error.

I finish off the lecture with a chat about how regression tends to make assumptions about the form of the mean relationship between variables so we can do parameter estimation and that differential equations give us a system we can solve to obtain that mean relationship. I state that while we can solve the DE numerically while simultaneously estimating the parameter it is way outside the scope of the course.

I had a bit of time this morning to spend on next week’s lecture material (linear algebra) so decided to have a go at numerical estimation for the logistic growth model and some data based on the Orange tree circumference data set in R with JAGS/rjags. It’s the first time I’ve had a go at combining regression and numerical solutions to DEs in the same code, so I’ve only used Euler’s method. That said, I was very happy with the solution and the code is provided below the cut.

Continue reading

A few R things

R: The Good Parts” is an attempt to showcase the best way to do things in R. I’m not yet at the stage of dealing with absolutely massive data sets but things will be heading that way for me if aerosol samplers continue to measure at higher frequencies. Left out of the article is a discussion of dplyr; I’m still using functions from the apply family! Maybe I should also get used to using data.table. (Update: I’m now using data.table and its syntax to apply functions across grouping levels that I’ve set as keys. This is amazing).

While we’ve been incorporating a few of the mathematical needs of SEB114 into SEB113 it looks like we may need to go a bit further with incorporating the R needs. I hadn’t really thought about plotting a specific function (other than a line y = ax + b) in the workshops but it looks like a few earth sciences students need to plot the function π x / (1+x)2. So we’ll have to take stock over the next six months of what the experimental science lecturers want to put in their units and how we can help support that (also how we can get the science lecturers to help reinforce statistical modelling over statistical testing).

Timetabling and the potential for alternative delivery in SEB113

I’ve been pretty busy writing the analysis plan for the main paper from the UPTECH project and reorganising SEB113 workshops. We’ve had some meetings recently with QUT timetabling people which has led to discussions about how we try to get students to enrol in a sensible pair of workshops and labs for both SEB113 and SEB114.

One of the biggest concerns when it comes to these paired subjects is making sure that people attend the labs and workshops in the right order and are working with the same groups across both subjects so that we can structure the teaching material. In SEB113 the preferred order of classes is Lectorial, Computer Lab, Collaborative Workshop. The lecture introduces the topic, the lab shows you how it’s implemented in R and the workshop gets you working in a group with others to solve a problem based on the topic.

The problem comes about with QUT’s timetabling software providing a timetable which contains no clashes for the core first year subjects (SEBs 101, 102, 113, 114). Timetabling the lectures/lectorials for these units so that they don’t clash is a task in and of itself and I’m impressed that the timetabling people have managed to make sure these subjects don’t clash (I remember taking two units for the applied physics co-major in the old B App Sc course where the lectures clashed). The non-clashing timetable doesn’t necessarily mean students can enrol in the class order that we would prefer. It’s also unlikely that we can automatically combine a lab-workshop pair as one thing to be enrolled in and it’s impractical to try to get a staff member to enrol students manually.

It’s got me thinking a lot about flipped classrooms and other ways of overcoming the timetable difficulty. The benefit of the workshop for students is that they have a group to work with on a big task and they have two tutors to ask for help when they get stuck. I feel like this would be difficult to do outside a classroom without some sort of help-desk queueing system that is only open between certain times (and then you’ve still got the time restrictions). The computer labs can be done individually at any time, though, as they’re about exposure to code rather than solving a particular problem. In this instance, we could probably cut down on the number of computer labs required by encouraging students to do the lab in their own time before their workshop, which is in the spirit of flipped classrooms.

The last labs are in week 7 (this week!) which means it’s not going to be an issue much longer this semester. Semester 2 has fewer SEB113 enrolments (SEB114 isn’t offered) so it’s not going to be as big an issue then. Whether we go with changing the timetabling system or we modify computer labs to become programming consults (where to get help you must have attempted the lab) is something we can deal with a bit later. With the use of Echo360 being made mandatory in all lectures at QUT the availability of recorded lectures makes it easier for students to go through the material at their own pace. With so many students in the subject, there’s a large number of person hours which go into content delivery. I’m not sure we’re using that resource (labour) as effectively as we can, and changing the way we deliver the subject may help that.

Science in context – my context

One of the first year units that QUT has introduced in the new Bachelor of Science program is SEB101 – Science in Context. The subject aims to impress upon students the idea that science happens as part of a larger community and that how and why research is conducted relies on interactions with that community.

I received an email last week from a former SEB113 student of mine, Kathryn Turner, asking if she could interview me for the SEB101 Portfolio about the work I do as research scientist. Kathryn and I organised to sit down and have a chat for this afternoon to discuss what I do, what relevance it has to the community and how the community sees the work we do.

We spent most of our fifteen minutes talking about the UPTECH project and how my work, statistical analysis for the various papers, is part of a large, interdisciplinary project that allows me to work with many different sources of data and do different analyses. I mentioned that I initially studied mathematical modelling, focussing on computational fluid dynamics, and that I got involved in this research project because my primary supervisor (Professor Lidia Morawska) lectured an elective that I took in my undergrad (Global Energy Balance and Climate Change). I went to have a chat with her after I’d finished Honours about what sort of PhD projects she might have available (writing about this now, it feels like a lifetime ago; it was only 2008) and she was in the process of planning UPTECH and recruiting people. I was offered the chance to apply cool mathematical techniques to an interesting environmental health problem based that had links to transport planning. Sign me up!

We also talked about the ethics side of the project, involving doing health diagnostic measurements with students, taking a health history and demographics survey home, etc. and how QUT makes sure we’re very careful with this sort of thing. I’m glad I didn’t have to do the ethics application for the project.

Kathryn asked what the schools thought about having scientists come in and work with the kids. From what I understand, the schools were quite accepting and the kids were excited about the prospect of being involved with the personal sampling aspect; we also handed out badges that say “I’m doing SCIENCE” to the kids who were part of the study.

On a bit of a tangent, and we didn’t discuss this, I think it’s good to have scientists seen as being regular people who have decided to pursue science and that the science isn’t just lab work. FermiLab did a really interesting project a few years ago about kids’ perceptions of scientists. They talked to some seventh graders and got them to describe and draw what they thought a scientist was before and after meeting a group of physicists who worked at the lab. The UPTECH members who went to the schools to do the measurements represent a very multicultural group, including (but not limited to) people of Iranian, Chinese, Egyptian, Malaysian, and Northern and Eastern European descent, and included both men and women. I hope that one of the outcomes of having such a diverse group involved with the field work for the project was that the students saw that scientists aren’t all old, white men with frizzy, greying hair, a lab coat and glasses.

After we’d wrapped up the interview, Kathryn said it was interesting to learn a bit more about the research career of a lecturer and seemed quite interested in the various projects that I get to work on. For my part, I found it a really interesting interview because I don’t often get asked about the ethical and community implications of my work. While I do spend my days sitting in front of a computer running statistical analyses, I am actually a research scientist who relies on the support of the public both through my funding and through social acceptance that looking at the health impact of air quality is valuable.

Posterior samples

NTNU in Trondheim, Norway, has five PhD fellowships open.

Visualising homicide rates in Mexico using R and GitHub (via Probability and Statistics Blog).

If you’re in Melbourne, Australia, you should consider popping along to Laborastory at the Cider House (Brunswick), where five scientists get up on the first Tuesday of the month to tell the stories of the heroes and history of their field. A friend of mine went along this week and enjoyed it immensely.

SEB113 has started again. I’ve already done 5 workshops (I have three a week). Introducing a whole new cohort of students to R, RStudio and ggplot. We did some paper plane throwing last week and had a look at how simple usage of faceting, colouring and stacking histograms can reveal both overall variation and group-to-group variation. A few students are still bewildered by the idea of writing code to make pictures but they recognise that it’s just a case of needing practice.

My reading list at the moment

While it continues to grow longer and longer, I have a stack of books on my desk that I hope to get through in a timely manner.

Always on the lookout for a good general reference for Bayesian statistics, I’ve borrowed a copy of Congdon’s “Bayesian Statistical Modelling”. This has some really nice examples in it and covers topics I’m interested in such as spatial statistics and splines.

I want to try my hand at understanding Variational Bayes, as I think it’ll be useful for a Discovery Project we’re submitting. To this end, the monolithic “Probabilistic Graphical Models” is sitting there, taunting me.

One of my PhD students is taking his first steps into advanced statistics, having completed a Coursera course in data analysis (his second course starts today). Jim Albert’s “Bayesian Computation in R” kind-of assumes you’ll be writing code rather than using packages but I found it a useful way to wrap my head around some concepts.

And as an early birthday present I got a copy of Nate Silver’s “The Signal and the Noise”. I’m already about 30 pages in and am quite impressed with how upfront he is about his belief in subjective Bayesianism as a means of inferring and predicting. Christian Robert reviewed the book a year ago and has some interesting thoughts on Silver’s approach to statistics.

I want to get a copy of Gelman’s BDA v3.

Posterior samples

It’s still ARC writing time so that’s been taking up quite a bit of time recently.

A coworker from Maths mentioned she’d started using Scrivener to storyboard her papers. Apparently it can be used with tools like git, Google Drive and Dropbox to work collaboratively but you’ve got to be careful of conflicted copies that won’t be discovered. Another coworker from Maths isn’t so impressed by Scrivener, noting that in terms of using it with version control software you’re better off using LaTeX anyway. At US$40 a license I’m a bit reluctant to try to make Scrivener part of my workflow, and there’s no way I can ask collaborators to fork out that kind of money.

I keep being impressed by Matt Wand, Jan Luts and Tamara Broderick’s work on realtime semiparametric regression. I saw Wand present this work at Bayes on the Beach just over a year ago. I can’t, for the life of me, wrap my head around mean-field variational Bayes, though. Perhaps it’s that I’ve never had to deal with Calculus of Variations and got into inference beyond linear models through MCMC in WinBUGS rather than through machine learning. I’ve got a few more books on my desk about statistical theory now, including Congdon’s “Bayesian Statistical Modelling”, Koller and Friedman’s “Probabilistic Graphical Models”, Casella and Berger’s “Statistical Inference”.