Strong and Weak distributional assumptions and Data Dependent Priors

2016 March 29
by Daniel Lakeland

In the previous post I discussed priors on parameters that are formed using the data. We saw how sometimes these are absolutely wrong, in the same way that it’s wrong to use knowledge you don’t have to form a prior without the data. Suppose you have knowledge that the mean of some treatment effect for a drug is somewhere between say -2 and 2 on some scale. It would be wrong, given this knowledge, to say

eff ~ normal(0,0.25);

That’s too specific, it uses information you don’t have. By the same token, if you know the mean is 0 but don’t know the scale of variation of some data, it would be wrong to say:

data ~ normal(0, sd(data));

or even

s <- normal(sd(data),0.000001);

data ~ normal(0,s);

in both cases, presumably you are using knowledge that you don’t have, knowledge that the sample standard deviation is virtually exactly equal to the population sd.

Weak Distributional Assumptions:

In Bayesian statistics we need not use a model for our data which is the same as the “true” population distribution. In many cases there hardly even exists such a “true” distribution, for example the process could be time dependent. And in any case, we often don’t have enough data to estimate and validate a frequency distribution. I’ve given an example previously using cartons of orange juice. The population distribution is strange, and the likelihood is based on an exponential distribution, which has support in areas that the population distribution doesn’t have. They don’t look at all alike.

In many cases, the experiments are not repeatable in any sense, and there is no fixed population that’s defined. Even if there were, we don’t know its distribution (like the orange juice example). So, it tends to be a good idea to use maximum entropy distributions because they are the least specific you can get for their class.

But sometimes we DO know the frequency distribution. For example, we’ve calibrated some electronic colorimetric instrument using a known color target, and now we’ve measured that color target 1000 times and seen how the measurement error looks. Next we go to measure the color of some batch of printing ink, and so we can get a very good frequency distribution for the errors in this printer ink measurement.

Strong Distributional Assumptions;

When we construct a prior such as the following:

e ~ normal(1, 0.75/sqrt(N));

s <- e*sd(data);

or what’s the same, in a data dependent prior on s:

s ~ normal(sd(data),0.75*sd(data)/sqrt(N));

We are basing our distribution on some information, either analytical, or based on a simulation study, of the characteristics of the sampling process given the frequency distribution from which data is sampled. In the “weak” case, we are basing our model only on scientific background knowledge, like “things aren’t too spread out” or “the data have a well defined average” or “measurements are positive, and tend to be close to zero but can occasionally be big” or whatever.

When we construct something based on a simulation or analytical sampling analysis, we’re being very specific about what we think the sampling process means for the data. When we condition on our knowledge of this sampling process, we should expect that our model fails to do a good job, when our assumption about the sampling process is violated.

Frequentist statistics relies heavily on this sampling concept. When the sampling model is true, it works well. Occasionally we can in fact verify the frequency distributions of some process prior to using it to collect data. When that occurs, the Bayesian is perfectly free to incorporate this additional information, typically it will improve estimates, but if the assumptions are violated, it should be expected to give wrong answers.

Legitimate “Data dependent priors” are likely to be those where we’re conditioning on some knowledge of a sampling process, after all we’re probably calculating sample statistics to get the “data dependence”. To the extent that we can rewrite them as a different parameterization that’s data independent, certainly no one can complain about the form. But, it’s perfectly reasonable to question the assumptions and the knowledge we think we have which we’re using in the construction. But, that’s always true in every Bayesian analysis, whether it’s data dependent or not.

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS