Gaussian Copula Process errors for ODE models

2013 June 18
by Daniel Lakeland

I have a model where a multidimensional ODE predicts the statistical dynamics of a very high dimensional ODE. In essence I’m predicting the dissipation of wave energy in a molecular system. In order to test my theory I want to fit the coefficients of my ODE to actual data statistically sampled from big computational runs of the molecular dynamics.

The behavior I want to constrain is the proper dissipation of the total kinetic energy carried by the wave. Once I have that, I want to show that the mechanism behaves appropriately by comparing the time history of the velocity of a single degree of freedom. So what I need is to fit the coefficients of my model to match the energy decay. I have informative priors for the size of the coefficients from theoretical analysis.

The biggest problem is how to fit this timeseries. As with almost all timeseries, if two timepoints are nearby then they have similar statistical errors. If my model predicts too much energy at time $$t$$ then it will probably predict too much at $$t+\epsilon$$ for “small” $$\epsilon$$.

The “obvious” way to deal with this is via a gaussian process, in other words we impose a covariance structure on the errors based on a covariance function which compares how close in time two measurements are. The problem with the straightforward application of gaussian processes to the errors, is that in this situation there are definitely some complex processes in the molecular dynamics which are not modeled by my ODE. This leads to outliers (for example in one run the molecular dynamics became unstable right before the end of the run, and energy shot off to orders of magnitude more than it really should have been. In other runs the initial conditions probably caused melting or recrystalization in the first few percent of the timeseries resulting in a dramatic loss of kinetic energy). I am not interested in these modeling errors for my purposes, I know they exist, I simply want to show that in many cases they aren’t important and focus on the cases where my simple model works. One way to do this is to specify that the errors have fat tails, like a t distribution. After an off-topic discussion on Andrew Gelman’s blog with a helpful commenter, the current error model looks like this.

\[ E_i = PE_i – AE_i \] where $$PE$$ is predicted energy and $$AE$$ is actual measured energy.

\[u_i = pt((PE_i – AE_i)/s, d) \] where $$s$$ is an error scale, and $$d$$ is the degrees of freedom, both $$s,d$$ are hyperparameters of my bayesian statistical model, the $$u_i$$ are now assumed to be uniform.

\[x_i = qnorm(u_i)\] If the $$u_i$$ really are uniform (ie. if the t distribution fits the marginal errors well), then these are gaussian with unit standard deviation and mean zero. I now specify that the $$x_i$$ have a gaussian process structure, that is they are related by how close in time they are, the $$t_i$$ values associated to the data points.

To get the likelihood of some observed $$E_i$$ we, as usual, want to specify the log-density of the vector of $$E_i$$ values. To do this, we need to go through the CDF. The CDF of a vector is a weird function, it’s a multi-argument function (one for each dimension of the vector), and it specifies the probability that the random variable is less than the observed value in each dimensions. The dependency structure can be thought of in terms of a copula $$C(u_1,u_2,u_3,\dots,u_n)$$. The $$u_i$$ values are all marginally uniform, but the joint distribution is not uniform over the hypercube. In fact $$C(u_1,u_2,u_3,\dots,u_n) = pmvnorm(x_1,x_2,x_3,\dots,x_n; \sigma)$$ with $$\sigma$$ coming from the gaussian process covariance function. Hence, this $$C$$ could be called a “Gaussian Process Copula“and the process is called a gaussian copula process.

To get the likelihood of a particular vector of $$E_i$$ we can differentiate

\[ \frac{\partial pmvnorm(x_1\dots x_n; \sigma)}{\partial E_1 \partial E_2 \dots \partial E_n} \]

When we differentiate this thing we get $$dmvnorm(x_1\dots x_n; \sigma) \prod_i \frac{\partial x_i}{\partial E_i}$$ and the $$x_i = qnorm(pt(E_i))$$ so $$\frac{\partial x_i}{\partial E_i} = \frac{1}{dnorm(x_i)}dt(E_i)$$. Using the mvtnorm package in R I can calculate the log of all this mess and hand that to the mcmc package and do Bayesian inference. Running last night it seems to have produced a significantly better fitting model that can move through space much easier, apparently a less rough landscape thanks to the error dependency.

Many thanks to “Corey” on Gelman’s blog for the suggestion¬†any inaccuracies in the above are entirely due to my attempting to figure the details out on my own.

 

One Response
  1. Corey permalink
    June 18, 2013

    Glad it turned out to be useful!

Comments are closed.