Starting more papers

One of the benefits of being a statistically curious researcher is that you get to read about all sorts of cool stuff. The UPTECH project is generating a huge amount of time series data, some of which have change points, non-linear behaviour, trends, and all sorts of other quirks. I’ve spent most of my time learning about the use of splines but over the last year have been exposed to Gaussian processes (and I guess I would say splines are a special case) and Gaussian Markov Random Fields.

I’ve been having the occasional chat with the other researchers about how to analyse the time series data they’re working with and have stumbled across some really neat methods. Apart from the work I’ve been doing on spline models with my Finnish collaborators, interesting ideas for analysing time series data include Treed Linear Models, Treed Gaussian Processes [1,2] and Dirichlet Process Mixtures of GLMs [3].

The tree nature of the first two models I mentioned is apparent in its partitioning of the covariate space into regions in which the behaviour is locally linear. Change points are placed where the behaviour changes and each partition has its own linear mean and its own variance estimate. This is a fairly simple model to fit but it’s a bit limited by its only using linear functions. The treed GP relaxes this and spends its time fitting a more GP within each partition, with the focus on the covariance relationship. The third, DP mixtures of GLMs gives much smoother estimates of the mean and credible interval and has some really nice properties courtesy of the DP (which looks to be superior to tree based clustering).

I find the tree structure of these models quite interesting and the treed linear model appears to be, conceptually, a mix of a multiple changepoint model and a piecewise linear regression spline with wombling knots. I’m not 100% sure how to apply these but an initial chat makes me think they will be very applicable and I’m looking forward to some exploratory data analysis.

[1] Gramacy, R. B. (2007). tgp: An r package for bayesian nonstationary, semiparametric nonlinear regression and design by treed gaussian process models. Journal of Statistical Software 19(9), 1–46.

[2] Gramacy, R. B. and H. K. H. Lee (2008). Bayesian treed gaussian process models with an application to computer modeling. Journal of the American Statistical Association 103(483), 1119–1130. (arXiv preprint)

[3] Hannah, L. A., D. M. Blei, and W. B. Powell (2011). Dirichlet process mixtures of generalized linear models. Journal of Machine Learning Research 12, 1923–1953.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s