Chris Wilson commented on some of my comments related to Jacobians and Masking in prior specification… It’s kind of long past the discussion, so I’m not sure if others will chime in. I figured I’d put my response here..

In the past I’ve never seemed to be able to express this idea effectively, I hope this somehow was more effective. The green screen idea is really a good analogy… you calculate some quantity (greenness, or Q(X))… and decide whether you want to upweight a given region of space or downweight that region of space based on some strictly non-negative function of the quantity…

so formally for any prior p_base(X)/Z_base, you can always use p_base(X)*deformation_factor(Q(X))/Z_deformed as a prior provided deformation_factor(Q(X)) is a strictly non-negative bounded function (you’re guaranteed to have a normalizable distribution if it started out normalizable and nothing is multiplied by anything bigger than some constant). The only formal requirement for a prior is that it be normalizable.

The easiest interpretation is when deformation_factor is between 0 and 1. Regions near where deformation_factor = 1 are not squashed at all, regions less than 1 are squashed… and if you are doing MCMC you don’t need to worry about the normalization so it’s just a squishing factor.

Then the non-formal question is just: did this actually express the knowledge I have? The easiest way to determine this is sample from the prior and see if the samples make sense. You have to do that whether or not you are using nonlinear transformed functions.

The fact that Q(X) is a nonlinear function of the X vector doesn’t enter into this at all… since deformation_factor is just squishing those regions that result in “weird” Q(x) values, and not squishing regions that result in “ok” Q(x) values. It isn’t directly a probability measure, it’s a dimensionless scaling factor. “Adding” a Jacobian correction is another way of saying “you squished this the wrong amount”… umm… says who?

A similar thing occurs when you use a nonlinear Q(data,params) in a likelihood function L(Q(data,params))… This expresses a conditional probability of a particular transformed “summary statistic” Q(data,params), conditional on both the data and the parameters so it’s just a squishing factor for how much to downweight the prior… which, it turns out, is what a likelihood is. You might call this a “pseudo-likelihood” or you might just call it a way to directly express a joint probability distribution on data and parameters without factoring it into p(data | params) p(params), instead it’s p(Q(data,params),params)

this is more or less what’s going on in approximate bayesian computation (ABC) where the matching is often expressed in terms of low-dimensional summary statistics. If all your observed data is is low dimensional summary statistics… then you’d have just called this the “likelihood” but if you happened to observe the high dimensional version of your data and you still decide to match on the low dimensional summary… you call it “approximate”. It’s really just a different but related model.

You certainly shouldn’t be doing jacobian magic on the Q(data,params), because that would express a different model. If your model really is:

L(Q(data,params))p(params)

and someone comes along and tells you it should be

J(params) L(Q(data,params)) p(params)

they are not telling you “you made a formal calculation mistake” they are telling you “I don’t like your model, use this other one instead” which is probably wrong.

Jacobian corrections are verifiable formal consequences of model transformations… These other things are non-formal modeling choices whose goodness is a matter of opinion in the same way that linear regression’s goodness for a particular problem is a matter of opinion. Often Q(data,params) is a scalar and {data,params} is a high dimensional vector. There is no Jacobian for a transformation from high dimensions to a scalar. The transformation isn’t invertible.

2 Responses
1. • 