# Deformation factors vs Jacobians… and all that Jazz

Chris Wilson commented on some of my comments related to Jacobians and Masking in prior specification… It’s kind of long past the discussion, so I’m not sure if others will chime in. I figured I’d put my response here..

In the past I’ve never seemed to be able to express this idea effectively, I hope this somehow was more effective. The green screen idea is really a good analogy… you calculate some quantity (greenness, or Q(X))… and decide whether you want to upweight a given region of space or downweight that region of space based on some strictly non-negative function of the quantity…

so formally for any prior p_base(X)/Z_base, you can always use p_base(X)*deformation_factor(Q(X))/Z_deformed as a prior provided deformation_factor(Q(X)) is a strictly non-negative bounded function (you’re guaranteed to have a normalizable distribution if it started out normalizable and nothing is multiplied by anything bigger than some constant). The only formal requirement for a prior is that it be normalizable.

The easiest interpretation is when deformation_factor is between 0 and 1. Regions near where deformation_factor = 1 are not squashed at all, regions less than 1 are squashed… and if you are doing MCMC you don’t need to worry about the normalization so it’s just a squishing factor.

Then the non-formal question is just: did this actually express the knowledge I have? The easiest way to determine this is sample from the prior and see if the samples make sense. You have to do that whether or not you are using nonlinear transformed functions.

The fact that Q(X) is a nonlinear function of the X vector doesn’t enter into this at all… since deformation_factor is just squishing those regions that result in “weird” Q(x) values, and not squishing regions that result in “ok” Q(x) values. It isn’t directly a probability measure, it’s a dimensionless scaling factor. “Adding” a Jacobian correction is another way of saying “you squished this the wrong amount”… umm… says who?

A similar thing occurs when you use a nonlinear Q(data,params) in a likelihood function L(Q(data,params))… This expresses a conditional probability of a particular transformed “summary statistic” Q(data,params), conditional on *both the data and the parameters* so it’s just a squishing factor for how much to downweight the prior… which, it turns out, is what a likelihood is. You might call this a “pseudo-likelihood” or you might just call it a way to directly express a joint probability distribution on data and parameters without factoring it into p(data | params) p(params), instead it’s p(Q(data,params),params)

this is more or less what’s going on in approximate bayesian computation (ABC) where the matching is often expressed in terms of low-dimensional summary statistics. If all your observed data is is low dimensional summary statistics… then you’d have just called this the “likelihood” but if you happened to observe the high dimensional version of your data and you still decide to match on the low dimensional summary… you call it “approximate”. It’s really just a different but related model.

You certainly shouldn’t be doing jacobian magic on the Q(data,params), because that would express a different model. If your model really is:

L(Q(data,params))p(params)

and someone comes along and tells you it should be

J(params) L(Q(data,params)) p(params)

they are not telling you “you made a formal calculation mistake” they are telling you “I don’t like your model, use this other one instead” which is probably wrong.

Jacobian corrections are verifiable formal consequences of model transformations… These other things are non-formal modeling choices whose goodness is a matter of opinion in the same way that linear regression’s goodness for a particular problem is a matter of opinion. Often Q(data,params) is a scalar and {data,params} is a high dimensional vector. There is no Jacobian for a transformation from high dimensions to a scalar. The transformation isn’t invertible.

Hi Daniel, I see what you mean here. Thanks! I still think you should write up your ideas here in the context of a very specific model (and data) and show the implications of using versus not using a Jacobian in case study, in addition to theoretically. I think you are on to something useful here, and this is honestly something I worry about as Stan gets more and more mainstream. Reviewers are gonna start asking about Jacobian corrections sooner or later, and if the default view is “always need ’em!”, it’s gonna be hard to explain otherwise.

What happens most often in practice is that the transformation is NOT one-to-one, so the whole concept of a Jacobian – as normally defined – doesn’t apply.

So I guess what I worry about most is the cases where you could define a one-to-one mapping – and a suitable Jacobian – but for the reasons you’ve outlined here, that doesn’t actually correspond to the intended model.