One of the biggest changes I noticed moving from doing statistics in Minitab (in first year data analysis) to doing statistics in R (in third year statistical inference) was that R encourages you to write functions. Normally this is done by writing functions in R’s own language (that call other functions, also written in R, which eventually call functions written in C) but it’s also possible to make use of other languages to do the heavy lifting. This isn’t unique to R, of course; MATLAB encourages the use of MEX files to improve run-times when you need to call the same custom function over and over again.

I’ve really only used high level languages to do my statistics, making use of other peoples’ optimised code that do the things that I want. I’ve seen the development of pyMCMC by reseachers at QUT and someone from my NPBayes reading group made quite heavy use of RCpp in his thesis. Python and C++ are probably the two languages that would be the most useful to learn given their ubiquity and reputation. I have been putting off learning these for years as I know that there’s a large time investment required to become proficient in programming and no external pressure to learn (unlike learning R as part of my PhD thesis work).

There’s no doubt that writing optimised code is a desirable thing to do, and that knowing more than one programming language (and how to use them together) gives you a much richer toolbox for numerically solving problems. I’m now at a point, though, where it looks like I may need to bite the bullet and pick up C++. JAGS, which I use through rjags in R, is a stable, fast platform for MCMC-based inference. It’s written in C++ and notifies you every time you load it in R that it has loaded the basemod and bugs modules. There are additional modules available (check in \JAGS\JAGS-3.4.0\x64\modules\) and it’s possible to write your own, as long as you know C++.

I’m at a point with the work I’ve been doing on estimating personal dose of ultrafine particles that I’d like to make the modelling more Bayesian, which includes figuring out a way to include the deposition model in the MCMC scheme (as I’d like to put a prior on the shape parameter of the reconstructed size distribution). My options seem to be either writing a JAGS module that will allow me to call a C++ified version of the function or to abandon JAGS and write a Gibbs sampler (or Metropolis-Hastings, but Gibbs will likely be quicker given the simplicity of the model I’m interested in). Either solution will stretch me as a programmer and probably give me a better understanding of the problem. Eubank and Kupresanin’s “Statistical Computing in C++ and R” is staring at me from the shelf above my desk.

Dimitri VasdekisSam, let me know if you find a decent resolution. I’ve also been keen to pick up C++ but I get frustrated when the learning environment isn’t ‘top to bottom’.

I’ve been learning Python via LPTHW to get a top-down understanding, and casually reading ‘The Art of Assembly’ to get a bottom-up view (it’s really readable! http://goo.gl/9W9Flb) but to turn the broader knowledge into something useful in a work context will require literally thousands of hours of commitment from me, for an indeterminate benefit (I have never seen anyone in Australia program in C++ in a work context).

I have some projects that I want to build in C++, but I really should just be using a Ruby on Rails framework – the days of being paid to build boilerplate are over.

Sam CliffordPost authorWell now that I’ve got the Gibbs sampler for my toy model mostly working in R I’m going to look at speeding it up by putting it in C++, which appears to be a pretty common thing in Bayesian stats at QUT. It’s ridiculously slow, so I’ll run some profiling on it to see if there’s something I can make much quicker, maybe even put it in MATLAB and compare speeds there. One day, when I have time…

Sam CliffordPost authorUpdate: I got this working in MATLAB and it’s reasonably fast. I just hate that MATLAB does the non-Bayesian interpretation of the Gamma distribution.

benrfitzpatrickHi Sam, have you heard about that new Julia Language? http://julialang.org/

It sounds faster than R and easier to learn than C++, I’ve not had a good enough reason to justify taking time out from my work to try it yet but I’m bearing it in mind should I run into an unavoidable computational challenge that is beyond R.

Sam CliffordPost authorI’ve heard of it and seen it discussed from time to time on the internet. I think Nick Tierney might’ve given the iJulia+gadfly demo at the BRAG visualisation workshop. I wonder if QUT’s HPC group has any experience with it.

nicholastierneyIt appears that there is a discussion of calling Julia from within R here: http://stackoverflow.com/questions/9965747/linking-r-and-julia

However, it seems like this is a ways away from becoming a reality.