I had a conversation with Christian Hennig that enlightened me about some confusion over what qualifies as Frequentist theory vs what qualifies as Bayesian theory.

Of course we’re free to create technical jargony definitions of things, but essential to my conception of Frequentism is the idea that probability is placed on observable outcomes only, so that in principle you could if necessary get a large sample of things and have the real actual frequency distribution in the form of say a histogram. Then you could say that a given parametric distribution was good enough as a model of that histogram for example. I quoted some wikipedia text which essentially said the same thing, and is consistent with various sources you might find online or in books. In general at least if this isn’t the only definition of Frequentism, it’s a common one.

The alternative, Bayesian viewpoint in my view was that you use probability distributions either on quantities that don’t vary (like say the speed of light or the gravitational acceleration in your lab during your wednesday morning experiment), or you use distributions over things that vary which are notional and have no validated connection to observed frequency, but just represent your own information about how widely things may vary.

The question “could this random number generator have produced our dataset” which is essentially the purpose to which a p value is put, is not exclusively the realm of Frequentist statistics. Every time we fit a Bayesian model we could be considered to be asking that question over and over again and using the likelihood of the data under the RNG model to answer the question, and determine whether to keep our parameter vector in the sample or not.

What this means is, a lot of people are using what you’d call Bayesian methods under this classification scheme, without really thinking about it. For example linear mixed models or “random effects” models… These are hierarchical models in which essentially the likelihood function based on a normal distribution is *very* rarely questioned against the data (ie. goodness of fit tests run) and the distributions over the “random effects” is essentially *never* tested against a repeated sampling process. This means the distribution over the random effects process is just a Bayesian distribution as it represents how likely it is to get effects of that size. It is typically taken as a normal distribution with mean 0 and standard deviation equal to essentially some maximum likelihood estimate (the standard deviation is found during the numerical fitting process I think).

In fact, there are plenty of situations where a mixed effects model is used and there are a finite few groups involved. The simplest would be 2 groups, but let’s even say 8, like in the “8 schools” example. The distribution of individual effects in the 8 schools *can not* be a normal distribution. In fact, these must be just fixed values one for each school. The notion that these 8 are a sample from a normal distribution is entirely notional, and has no direct connection to observable frequency.

The only thing these kinds of models don’t have is an explicit proper prior. And Bayes isn’t just “frequentism with priors” it’s probability as measure of credence of an idea. People are already doing Bayesian analysis every time they run a mixed effects model, they just are dodging some of the responsibility for it by hiding it under the hood of “well accepted practices codified into the lme4 library” or some such thing.

Next up: Using an ABC like process and a likelihood based on a p value, we can construct confidence intervals, showing that confidence intervals are just shitty flat posterior intervals.