# On Incorporating Assertions in Bayesian Models

I’ve been thinking a lot recently about how to specify a prior over functions. One way is to specify a Gaussian Process, but this can be quite computationally heavy, and when the function in question is fairly smooth and simple, it’s also kind of overkill.

The typical method for specifying a function is via a basis expansion. For example in the R function lm you can specify a third order polynomial model for y as $$y \sim x + x^2 + x^3$$. This means $$y = a + bx + cx^2 + dx^3 + \epsilon$$ and uses linear least squares to fit.

Well, when fitting this sort of thing in a full Bayesian model, you should provide priors over a,b,c,d. Typically we can specify some very simple thing about each, such as maybe

a ~ normal(0,100) if you know that the size of the coefficients is typical a small multiple of 100.

But usually we have some information not about the individual coefficients themselves, but about some property of the function. For example, we might know that the function is “near” to some constant value, doesn’t change too rapidly, and doesn’t wiggle too rapidly:

\[\int_0^1 f(x) dx \sim \mu\]

\[\sqrt{\int_0^1 (\frac{d}{dx} f(x))^2 dx} \sim S\]

\[\sqrt{\int_0^1 (\frac{d^2}{dx^2}f(x))^2 dx}\sim Q\]

From the perspective of MCMC, we can calculate numerical quantities either using symbolic methods or by numerical integration, and then place probabilities on these outcomes. Note, these quantities are in general nonlinear functionals of the f(x) function. The parameter space in general will be high dimensional (let’s say for example 15 coefficients in a polynomial fit) and the number of conditions for which we have prior information are less than 15, so there is no one-to-one mapping from coefficients to $$\mu,S,Q$$. As such, the prior you’re going to place here is in part determined by the “multiplicity” of the outcome. Some $$\mu$$ values can occur in more ways than other $$\mu$$ values… This is just a fact of life about a mismatch between our state of information and the dimensionality of the parameters.

As an example, here’s a complete R script using rstan and ggplot2 to put a distribution on a 3rd order polynomial on the region $$x \in [0,1]$$. To calculate the mean value, RMS slope, and RMS curvature I use a super-simple 3 point midpoint integration rule. I specify that the mean value is normally distributed around 100, the RMS slope is of typical size 10, and the RMS second derivative is on order 1000. Run the script and you’ll get 20 random curves plotted, you’ll see that they are “flattish” with some mild wiggles on the scale of a function whose values are around 100.

Add some data generated by your favorite function with some random measurement error, and add a normal likelihood, and see what happens. Add some additional parameters aa,bb,cc,dd which have uniform(-1e6,1e6) priors and the same normal likelihood (essentially the least-squares solution). Compare the posterior fits especially with only a few data points when you provide smoothness information compared to the unconstrained least squares solution.

Does the very low quality numerical integration routine matter? What happens if you add some additional higher order terms? If you have N data points and N coefficients, you can fit the curve directly through the points. That will inevitably be the least squares solution. Does that happen when you provide this kind of smoothness prior?

Note: it’s problematic to do basis expansion in the 1,x,x^2,x^3… basis, usually you’d work with an orthogonal polynomial, and deal with numerical evaluation of the polynomial using special techniques for stability and to prevent cancellation problems etc. But this is just an example, and the technique should work well even for other types of basis expansions, such as radial basis functions which are often a good idea for strangely shaped domains (2D or 3D functions for example).

Above are pairs of scatterplots for samples of a,b,c,d, showing how the effective prior looks.

library(rstan) library(ggplot2) stanmodl <- " functions{ real f(real x,real a, real b, real c, real d){ return a + b*x + c*x^2 + d*x^3; } real fp(real x, real a, real b, real c, real d){ return(b + 2*c*x+3*d*x^2); } real fp2(real x, real a, real b, real c, real d){ return 2*c+6*d*x; } real favg(real a, real b, real c, real d){ return((f(1.0/6,a,b,c,d) + f(0.5,a,b,c,d) + f(1-1.0/6,a,b,c,d))/3); } real slfun(real a, real b, real c, real d){ return(sqrt((fp(1.0/6,a,b,c,d)^2 + fp(0.5,a,b,c,d)^2 + fp(1-1.0/6,a,b,c,d)^2)/3)); } real qfun(real a, real b, real c, real d){ return sqrt((fp2(1.0/6,a,b,c,d)^2 + fp2(.5,a,b,c,d)^2 + fp2(1-1.0/6,a,b,c,d)^2)/3.0); } } parameters{ real a; real b; real c; real d; } transformed parameters{ real fbar; real q; real slavg; fbar = favg(a,b,c,d); q = qfun(a,b,c,d); slavg = slfun(a,b,c,d); } model{ fbar ~ normal(100.0,10); slavg ~ exponential(1/10.0); q ~ exponential(1/1000.0); } " samps <- stan(model_code=stanmodl); vals <- extract(samps); plt <- ggplot(data=data.frame(x=0),aes(x=x))+xlim(0,1) funs <- list(); for(i in sample(1:2000,20)){ a <- vals$a[i]; b <- vals$b[i]; c <- vals$c[i]; d <- vals$d[i]; print(a); funs <- c(funs, (function(a,b,c,d){a <- a; b <- b; c <- c; d <- d; function(s){return(a+b*s+c*s^2+d*s^3)}})(a,b,c,d)) } print(funs); for(i in 1:length(funs)){ plt <- plt+stat_function(fun=funs[[i]]) } print(plt);

Some really weird R scoping issue causes you to plot 20 copies of just the LAST function if you don’t do that rigamorole a <- a; b <- b; etc. You need to create local variables for a,b,c,d for the closure to capture, and just having them in the function arguments doesn't do it.