I’ve been using Twitter for reading and our QUT Maths/Stats Slack domain for discussing maths, stats, data science, etc. over the last few months. So the way I use social media for work has changed a lot, and I’ve not been blogging as often. In any case, I figured there’s enough random stuff I encounter floating around that it could be good to restart the info dump that is Posterior samples.
Wilson et al. – Good Enough Practices in Scientific Computing. I’ve noticed that I’m a lot more evangelical about people using git to collaboratively work on R-based analysis these days. Whether it’s people a few desks or a few hundred kilometres away, getting your work up on a private github repository is going to make it easier for us to work together.
I probably should have put this post up earlier because it’s now a huge collection of stuff from the last month. Here we go!
It appears that Hilary Parker and I have similar (but by no means identical) work setups for doing stats (or at least we did two years ago). It’s never too late to come up with a sensible way of organising your work and collection of references/downloaded papers.
Applied statisticians should probably teach scientists what it is we do, rather than just the mathematics behind statistics. This is a difference I’ve noticed between SEB113 and more traditional statistics classes; we spend a lot less time discussion F distributions and a lot more time on model development and visualisation.
Speaking of visualisation, here’s a really great article on visualisation and how we can use small multiples and colour, shape, etc. to highlight the interesting differences so that it’s very clear what our message is.
SEB113 students really seemed to enjoy looking at mathematical modelling last week. The Lotka-Volterra equations continue to be a good teaching tool. A student pointed out that when reviewing the limit idea for derivatives it’d be useful to show it with approximating the circumference of a circle using a polygon. So I knocked this up:
This week I showed in the workshop how Markov chains are a neat application of linear algebra for dealing with probability. We used this interactive visualisation to investigate what happens as the transition probabilities change.
Zoubin Ghahramani has written a really nice review paper of Bayesian non-parametrics that I really recommend checking out if you’re interested in the new modelling techniques that have been coming out in the last few years for complex data sets.
Exercism.io is a new service for learning how to master programming by getting feedback on exercises.
Interested in collaborative use of R, MATLAB, etc. for analysis and visualisation within a webpage? Combining plotly and iPython can help you with that.
Cosmopolitan (yes, that Cosmopolitan) has a great article interviewing Emily Graslie, Chief Curiosity Officer at the Field Museum in Chicago. She discusses being an artist and making the transition into science, science education and YouTube stardom.
A few of the PhD students in my lab have asked if I could run an introduction to R session. I’d mentioned the CAR workshop that I’d be doing but the cost had put them off. Luckily, there are alternatives like Datacamp, Coursera and Lynda. Coursera’s next round of “Data Science”, delivered by Johns Hopkins University, starts next Monday (Course 1 – R Programming). So get in there and learn some R! I’m considering recommending some of these Coursera courses to my current SEB113 students who want to go a bit further with R, but the approach that they take in these online modules is quite different to what we do in SEB113 and I don’t want them to confuse themselves.
ARC Discovery Projects have been returned to their authors, and we are putting our responses together for the rejoinders. Interesting to see that we got a comment suggesting that we use the less restrictive CC-by instead of CC-by-nc-sa as we’d suggested. We weren’t successful in our Linkage Project applications, which is disappointing as they were interesting projects (well, we thought so). Continuing to bring research funding in is an ongoing struggle for all research groups and I feel it’s only going to get harder as the new federal government’s research priorities appear to be more aligned to medical science that delivers treatments than to our group’s traditional strengths.
SEB113 is pretty much completely over for the semester, with marks having been entered for almost every student. Overall I think the students did fairly well. We had some issues with the timetable this semester. Ideally, we’d like the Lecture, then all of the computer labs, then all of the workshops, so that we can introduce a statistical idea, show the code and then apply the idea and code in a group setting. Next semester, we have the lecture followed immediately by the workshops with the computer labs dotted throughout the remainder of the week. This has provided us with an opportunity to try some semi-flipped classroom ideas, where students are able/expected to do the computer lab at home at their own pace rather than watch a tutor explain it one line at a time at the front of a computer lab.
I’m teaching part of a two day course on the use of R in air pollution epidemiology. My part will introduce Bayesian statistics with a brief overview, a discussion about prior distributions as a means of encoding a priori beliefs about model parameters, and discuss the use of Bayesian hierarchical modelling (as opposed to more traditional ANOVA techniques) as a way of making the most of the data that’s been collected. The other two presenters are Dr Peter Baker and Dr Yuming Guo. The course is being run by the CAR-CRE, who partially fund my postdoctoral fellowship.
I had meant to post this back when they were doing the rounds, but there’s a bunch of plots that attempt to show that correlation isn’t causation and that spurious correlations exist in large data sets. Tom Christie has responded to this by going over the fact that correlation in time series isn’t as simple as in the case of independent, identically distributed data. One should be careful that one’s criticism of bad statistics is itself founded on good statistics.
I’m teaching science students how to do statistics. It would be great if we could turn them into Bayesians, especially seeing as we’ve just covered the Agresti-Coull correction for estimating proportions from small experiments. Andrew Gelman has an interesting paper on teaching Bayesian statistics to non-statisticians that focuses on the delivery of skills rather than concepts. I would definitely agree with his approach, especially when you consider how he stresses that discussing the model is probably the most important part.
NASA have done some work simulating global aerosols and it’s been compiled into a neat video (via It’s Okay To Be Smart’s Joe Hanson). CSIRO have been doing some interesting stuff looking at the production of organic aerosols as well, so this is something I’m paying a bit more attention to at the moment.
Datacamp is a set of online labs for learning to use R, covering the basics of R, data analysis and statistical inference, and computational finance and econometrics.
Learn to be a better coder by improving your communication skills. The most practical (in terms of coding, at least) aspect of this includes using meaningful names and writing comments that describe what the code does when it’s not clear from the code itself.
If you’re in Melbourne, Australia, you should consider popping along to Laborastory at the Cider House (Brunswick), where five scientists get up on the first Tuesday of the month to tell the stories of the heroes and history of their field. A friend of mine went along this week and enjoyed it immensely.
SEB113 has started again. I’ve already done 5 workshops (I have three a week). Introducing a whole new cohort of students to R, RStudio and ggplot. We did some paper plane throwing last week and had a look at how simple usage of faceting, colouring and stacking histograms can reveal both overall variation and group-to-group variation. A few students are still bewildered by the idea of writing code to make pictures but they recognise that it’s just a case of needing practice.