One of the benefits of waking up stupidly early is that you can organise to have coffee and a chat with a friend before work and still get there on time. A friend who I haven’t seen in a while contacted me recently to ask a few questions about learning R for data analysis and visualisation. While they won’t need to formally learn statistics and visualisation for their work it certainly doesn’t hurt to be able to generate better analysis of data and make more informative and easy to interpret graphs.
My friend hasn’t done any statistics since high school Maths B, approximately ten years ago, which makes them similar to many of my SEB113 students. They have done a bit of programming along the way as a hobby, which will of course be a huge help. Having downloaded R and had a crack at a ggplot2 tutorial, they were confident that they could learn what was going on even though they didn’t really understand what was going on in the tutorial. We sat down with the tutorial and some avocado on toast and worked through what the arguments for each function represented and what the data frame was made of, how ggplot has a grammar of graphics and how we can continue to add elements to the code to change the plot.
To an extent, the ability to work through but not explain what some code is doing is typical of an SEB113 student in the first half of the subject (where we provide the code and get them to run it). It’s not until later in the semester, when the computer labs stop, that we expect that they can turn their ideas into code (and they’re welcome to cannibalise the code we provide) to write their quantitative workbooks. I suggested the Coursera course that started yesterday as a way to get a bit more familiar with how R works and get recognition of the completion of the course (which isn’t a recognised qualification but is evidence of being interested enough to pursue it).
These days I’m always on the lookout for better ways to introduce SEB113 students to R and ggplot2 and I found the following tutorials (and have passed them on to my friend and the SEB113 teaching team) via Matt Asher’s “Statistics Blog” and I have copied and pasted the text directly:
I had no idea that the coefplot package existed! That’s going to make visualisation of fitted linear models much easier for our students, as we’ve previously had them using geom_segment to manually plot estimates and confidence intervals.
This is part of what I love about R, compared to, say, SAS. There’s a huge community of people working out there to add extra functionality to an open source project by building on each others’ work. GGally and coefplot both require ggplot2 and have got a lot of really nice functions that extend the publication quality graphics of ggplot2. The community is quite active and if you can think of a question for R there’s probably an answer out there already.