Gaussian Copula Process errors for ODE models

2013 June 18
by Daniel Lakeland

I have a model where a multidimensional ODE predicts the statistical dynamics of a very high dimensional ODE. In essence I'm predicting the dissipation of wave energy in a molecular system. In order to test my theory I want to fit the coefficients of my ODE to actual data statistically sampled from big computational runs of the molecular dynamics.

The behavior I want to constrain is the proper dissipation of the total kinetic energy carried by the wave. Once I have that, I want to show that the mechanism behaves appropriately by comparing the time history of the velocity of a single degree of freedom. So what I need is to fit the coefficients of my model to match the energy decay. I have informative priors for the size of the coefficients from theoretical analysis.

The biggest problem is how to fit this timeseries. As with almost all timeseries, if two timepoints are nearby then they have similar statistical errors. If my model predicts too much energy at time t then it will probably predict too much at t+\epsilon for "small" \epsilon.

The "obvious" way to deal with this is via a gaussian process, in other words we impose a covariance structure on the errors based on a covariance function which compares how close in time two measurements are. The problem with the straightforward application of gaussian processes to the errors, is that in this situation there are definitely some complex processes in the molecular dynamics which are not modeled by my ODE. This leads to outliers (for example in one run the molecular dynamics became unstable right before the end of the run, and energy shot off to orders of magnitude more than it really should have been. In other runs the initial conditions probably caused melting or recrystalization in the first few percent of the timeseries resulting in a dramatic loss of kinetic energy). I am not interested in these modeling errors for my purposes, I know they exist, I simply want to show that in many cases they aren't important and focus on the cases where my simple model works. One way to do this is to specify that the errors have fat tails, like a t distribution. After an off-topic discussion on Andrew Gelman's blog with a helpful commenter, the current error model looks like this.

 E_i = PE_i - AE_i

where PE is predicted energy and AE is actual measured energy.

u_i = pt((PE_i - AE_i)/s, d)

where s is an error scale, and d is the degrees of freedom, both s,d are hyperparameters of my bayesian statistical model, the u_i are now assumed to be uniform.

x_i = qnorm(u_i)

If the u_i really are uniform (ie. if the t distribution fits the marginal errors well), then these are gaussian with unit standard deviation and mean zero. I now specify that the x_i have a gaussian process structure, that is they are related by how close in time they are, the t_i values associated to the data points.

To get the likelihood of some observed E_i we, as usual, want to specify the log-density of the vector of E_i values. To do this, we need to go through the CDF. The CDF of a vector is a weird function, it's a multi-argument function (one for each dimension of the vector), and it specifies the probability that the random variable is less than the observed value in each dimensions. The dependency structure can be thought of in terms of a copula C(u_1,u_2,u_3,\dots,u_n). The u_i values are all marginally uniform, but the joint distribution is not uniform over the hypercube. In fact C(u_1,u_2,u_3,\dots,u_n) = pmvnorm(x_1,x_2,x_3,\dots,x_n; \sigma) with \sigma coming from the gaussian process covariance function. Hence, this C could be called a "Gaussian Process Copula"and the process is called a gaussian copula process.

To get the likelihood of a particular vector of E_i we can differentiate

 \frac{\partial pmvnorm(x_1\dots x_n; \sigma)}{\partial E_1 \partial E_2 \dots \partial E_n}

When we differentiate this thing we get dmvnorm(x_1\dots x_n; \sigma) \prod_i \frac{\partial x_i}{\partial E_i} and the x_i = qnorm(pt(E_i)) so \frac{\partial x_i}{\partial E_i} = \frac{1}{dnorm(x_i)}dt(E_i). Using the mvtnorm package in R I can calculate the log of all this mess and hand that to the mcmc package and do Bayesian inference. Running last night it seems to have produced a significantly better fitting model that can move through space much easier, apparently a less rough landscape thanks to the error dependency.

Many thanks to "Corey" on Gelman's blog for the suggestion any inaccuracies in the above are entirely due to my attempting to figure the details out on my own.


One Response leave one →
  1. Corey permalink
    June 18, 2013

    Glad it turned out to be useful!

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS