A strategy for MCMC on rough expensive landscapes?

2013 June 4
by Daniel Lakeland

My model is fitting a little better now, but it still is slow, much slower than I'd like. The parallel tempering sampling that I am doing is expensive, and doesn't seem to get me a lot of improvements in moving around the space of possibilities. I've been thinking about how I'd rather do this, and thought of the following idea.

The idea behind tempering is in essence to make acceptance easier by making the PDF less rough, keeping the valleys not too low so we can move from one local maximum to another. The high temperature distributions are not much like the low temperature distribution of interest, so in particular, it shouldn't matter if we approximate them by something easy to evaluate.

Suppose we start at some point, and we evaluate the exact tempered distribution at high temperature F(x)/T where F(x) = \log(p(x)) and T \gg 1, so we do some markov jumps according to some simple method like multivariate independent normal proposals. At each jump we store x,F(x) and we do this until we are satisfied that we have sampled the high probability manifold of the tempered distribution. Remember that this is much easier and the jumps can be much bigger than for the untempered distribution F(x)/1.

Now we use this discrete dataset to construct an intermediate temperature approximate function G(x) = RBF(F(x)/T) T/K where RBF(F(x)/T) means some radial basis function approximation to the data we got from F(x)/T and K < T. We're now approximating F(x)/K for some less tempered version at temperature K. We can calculate the RBF approximation in a tiny fraction of the time it takes to evaluate F since F requires running some ODE simulation or something similar. The tempered distribution need not be a fully accurate approximation to F since we're using it to get fast umbrella samples. Let's call this RBF approximated function Q(x). We now run the markov chain that chooses at random some number of samples to do from Q(x) with an average number much larger than 1, perhaps exponentially distributed with a mean of 10 to 100, after that, we begin a simulated annealing process in which we "untemper" our approximate distribution according to some schedule Q(x) T(n) where for large n the T(n) = K, and after some number of steps, this is our proposal for the lowest temperature distribution F(x).

Now, your first objection may be, this doesn't seem to satisfy detailed balance. That may be true, but I don't think it matters. In fact, Manousiouthakis and Deem have shown that we only need regular sampling and "balance". Balance means that it leaves the equilibrium distribution unchanged, and this seems to do that, since we only accept the final proposal according to the Metropolis acceptance probability. The condition on regular sampling is that every state must be reachable with some nonzero probability after some fixed number of steps. The intermediate temperature umbrella sampler can reach anywhere in the state space since the RBF approximation is finite everywhere. It seems like this has been done before. In fact a recent paper on adaptive monte carlo ideas explicitly mentions this kind of idea (also with Deem).

One Response leave one →
  1. Daniel Lakeland
    June 4, 2013

    One concern I have is that these results seem to be in the language of discrete state spaces, like Ising lattices. How the whole thing holds when the parameters are continuous may be more complicated. Still, it seems likely to work and the experimental evidence from people doing this kind of computation suggests there should be some general theorem for continuous state spaces.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS