Difficult MCMC

2013 June 3
by Daniel Lakeland

Following up on what I was doing last year about this time, I’ve got a model for a physical phenomenon, and it involves a multidimensional ODE. The result is a timeseries and the quantity of interest is how well the ODE predicts the decay of kinetic energy in the system that it models. The system it models is a many-body system where I have statistical measurements from detailed simulations of every body in the system. I have a pretty good idea of the size of several parameters in the system. I wanted to fit the parameters and see how well my model works.

Actually, I already did this last year, but I fit the model to a kind of low-order fourier series approximation of what actually happens. Based on that, and some theoretical modeling, I came up with my ODE last winter. Now I actually want to fit the output of the ODE to the data.

Since the likelihood function is based on the error between the ODE prediction and the observed data, I need to run the ODE at each step of my MCMC, for each timeseries. I’m currently using a random sample of 20 example experiments. So I have 20 timeseries with about 80 observations in each timeseries. I am assuming (incorrectly but conveniently) independent t distributed errors in the time series.

I had been having a problem where the MCMC got trapped in a very sub-optimal place because a few of the timeseries were fit very well. I tried parallel tempering the posterior and found that the different temperatures virtually never swapped. The landscape must be pretty rough and obnoxious.

Eventually, I realized the folk theorem of statistics was probably at play. There was a problem with the model. By graphing the data and the model together with error bars I realized that the size of the errors that I was specifying only took one source of error into account (more or less thermal error) there was also modeling error to consider. Specifying that model-misfit might be a few percent suddenly allowed the MCMC to move around on a much bigger scale.

I realized that tempering the likelihood in an independent errors problem is pretty much like specifying a bigger error scale, for example in a gaussian error model: $$(\sum -1/2 (e/s)^2)/T = \sum -1/2 (e/\sqrt{T}s)^2$$. The same is not quite true in non-gaussian cases, but it has the same character, a very small error scale means that your model must fit your data very well, relaxing that fit via tempering and relaxing that fit via a better estimate of the error scale are similar. In my case, I’m being a bit lazy and not including a per-timeseries error scale, yet.

So far I’ve been using Geyer’s “mcmc” package for R, since it calls an R function to find the posterior log-density, and then I can run my ODEs. By the way, this takes a long time. On my laptop, one evaluation of the posterior density function takes about 5 or 10 seconds or something.

In any case, now I can jump in MCMC with jump sizes that represent lengths larger than a few fractions of an atomic diameter 🙂

Comments are closed.