The Implicit Function Theorem and Likelihood Functions

2017 September 22
by Daniel Lakeland

In Bayesian statistical modeling we often use the symbol ~ which denotes a kind of “statistically equal to”. Consider the following:

\[y = ax+b+\epsilon\]

If $$\epsilon = 0$$ then this is an equation of a line, whereas if we say $$\epsilon \sim N(0,1)$$ for example then this denotes a line with errors that have a certain range of credible sizes and are centered around 0. Well this statement about the distribution of $$\epsilon$$ doesn’t alter the algebraic properties of the symbolic expression $$y=ax+b+\epsilon$$ and so that equality still respects all the usual algebraic rules.

\[y-ax-b = \epsilon\]

Is true, and so $$y-ax-b \sim N(0,1)$$ is true by substitution of $$\epsilon$$.

In general you might have a fairly complicated relationship, something like

\[F(x,y,a,b) = \epsilon\]

With F a nonlinear non-separable relationship between the quantities, for example

\[y^2 -\frac{x}{y}\mathrm{atan}(ay)+\frac{b}{a} = \epsilon\]

Or something equally nasty from the perspective of trying say “solve for y”. We can suppose that y is our data, and x a covariate and a,b are parameters. What do we make of this relationship in a Bayesian model and how do we use it?

In Stan, if you create a transformed parameter F = y^2-x/y*atan(a*y)+b/a and then say

F ~ normal(0,1)

You will get a message about how the left hand side of this sampling statement contains a transform of a parameter, and if it’s nonlinear you need to include the Jacobian of the transformation in the target. This warning message is designed to alert you to something you need to do when you re-parameterize. But a re-parameterization is a purely formal transformation. It doesn’t alter the meaning of your model, it alters the way in which the model is expressed. For example if you have y = ax+b and you change this to y/a = x +b/a and then rename y/a = y’ and b/a = b’ and say y’ = x + b’, this is a formal transformation that doesn’t alter the meaning of the equation (provided a is not 0). On the other hand, if you do y = ax^2 + b then you’re changing your model.

The statement F ~ normal(0,1) above is not a formal transformation, it is in fact a statement about your data, a kind of likelihood, it’s just an implicit statement about your data.

Although we can’t necessarily solve our equation for y symbolically, there is a theorem called the implicit function theorem which enables us to say that as long as our relationship F(x,y,a,b) is sufficiently well behaved, then in some region around any given point x’,a’,b’  there exists a function y = f(x,a,b) even if we don’t know how to express it. For example when the distribution for a is well separated from 0 then we won’t be dividing by a=0 and so our expression F is well behaved. And so, our statement

F ~ normal(0,1) is really a statement about

\[y-f(x,a,b) = \epsilon\]

And could be re-expressed as

\[y \sim N(f(x,a,b),1)\]

Which for y a data value is obviously the usual kind of likelihood expression. The problem is, although this $$f$$ function exists, that doesn’t mean we know what it is. We do, however, know the relationship $$F(x,y,a,b)$$ and so why not do

\[F(x,y,a,b) \sim N(0,1)\]

Which has exactly the same meaning.

Note, to the best of our knowledge we have decided to model $$\epsilon \sim N(0,1)$$ which is a modeling choice, and subject to questions regarding whether it expresses valid facts about the world more than it is subject to questions about mathematical correctness. This fact isn’t derived mathematically from anything, it’s assumed and so it should be questioned primarily on modeling grounds more than anything else. There can be mathematical facts that are relevant I suppose, but the main question is “is this a good model” not “did you derive the N(0,1) correctly” since it isn’t derived.

All of this is another way to think about what I called “declarative models” a while back when I first started thinking about this topic.


Comments are closed.