One of the things I catch myself saying quite often in SEB113 is “This is new. It’s hard. But remember, you weren’t born knowing how to walk. You learned it”, as my way of saying that it’s okay to not understand this straight away, it takes time, practice and determination. I often say this in response to students complaining about learning R to do their data analysis. It’s actually got to the point where the unit co-ordinator suggested I get a t-shirt printed with “You weren’t born knowing how to walk” on the front and “So learn R” on the back.

One of the reasons I’m so keen to push new students into learning R is that while Excel can do some of the simpler calculations required in the first year of a science degree it is often completely inadequate for doing data analysis as a professional scientist, or even in an advanced level university course. I actually saw a senior researcher in a 3 day Bayesian statistics course try to avoid using R to code a Gibbs sampler by getting it up and running in Excel. They managed it, but it took minutes to run what the rest of us could compute in a second (and it was for a trivially simple problem).

There are problems with Excel, such as its inability to deal with the standard deviation of a group of very large numbers due to its bizarre formulation. Apparently the secret to sane use of Excel is to only use it for data storage. This guiding principle has meant that I no longer manipulate my data in Excel. Even with time stamp information I’ll fire up the lubridate package to convert from one format to another. I’m slowly exploring the Hadleyverse and that sort of approach is filtering through into SEB113 where we’re teaching the use of ggplot2 and reshape2 within RStudio. These are all powerful tools that simplify data analysis and avoid the hackish feel that much Excel-based analysis has, where pivot tables are a thing and graphs are made by clicking and dragging a selection tool down the data (which can lead to some nasty errors).

The fact that these powerful tools that make data analysis simple are free is another reason to choose R over Excel. I’m not on the “Open Source Software and provision of all code is mandatory” bandwagon as others seem to be when it comes to analysis being replicable. I agree it’s a worthwhile goal but it’s not a priority for me. That said, though, I definitely support encouraging the use of free software (in both senses) in education on the grounds of equity of access.

I had a chat with some students in SEB113 yesterday about why we’re teaching everything in R given that the SEB114 staff use a combination of Excel, MATLAB (and maybe even other packages I don’t know about). If we were to teach analysis the way that the SEB114 lecturers do it themselves, we’d have to teach multiple packages to multiple disciplines. Even discounting the fact that everything we teach is implemented in R, that R is free (unlike Excel and MATLAB), cross-platform (Excel on Linux? Try OpenOffice/OfficeLibre) and extensible (MATLAB has toolboxes, Excel has add-ins, R has a nice package manager) was a big plus for students who said that being able to work on assignments at home was valuable and so paying for software would make study difficult.

Convincing students to use R can be difficult, especially if they have no programming background, but ultimately they seem to accept that R is powerful, can do more than Excel and that writing reusable code makes future analysis easier. Convincing SEB114 academics that teaching their students to use R is a good idea is probably a harder sell, given that they’ve got years of experience with other tools. It’s still only semester 3 of the new Bachelor of Science course so we’ll have to see how this plays out over the years to come.