# Bayesian models are also (sometimes) Non-Reproducible

Shravan at Gelman’s blog writes “I moved away from frequentist to Bayesian modeling in psycholinguistics some years ago (2012?), and my students and I have published maybe a dozen or more papers since then using Bayesian methods (Stan or JAGS). We still have the problem that we cannot replicate most of our and most of other people’s results.”

So, Bayesian modeling isn’t a panacea. Somehow I think this has been the take-away message for many people from my recent discussions of the philosophy of what Bayes vs Frequentist methods are about. “Switch to Bayes and your models will stop being un-reproducible!”

So, I feel like I’ve failed (so far, in part), because that certainly isn’t my intention. The only thing that will make your science better is to figure out how the world works. This means *figuring out the details of the mechanism that connects some observables to other observables using some parameter values which are unobservable*. If you’re doing science, you’re looking for mechanisms, causality.

But, I believe that Bayesian analysis helps us detect when we have a problem, and by doing Bayesian analysis, we can force a kind of knowledge of our real uncertainty onto the problem, that we can’t do with Frequentist procedures!

That detection of un-reproducibility, and the ability to force realistic evaluations of uncertainty, both help us be less wrong.

## Some Bayesian Advantages

So, what are some of the things that make Bayesian models helpful for detecting scientific hypotheses that are unreproducible or keep our models more realistic?

## Consistency is Detectible:

First off, there’s a huge flexibility of modeling possibilities, with a single consistent evaluation procedure (Bayesian probability theory).

For example, you can easily distinguish between the following cases:

- There is a single approximately fixed parameter in the world which describes all of the instances of the data. (ie. charge of an electron, or vapor pressure of water at temperature T and atmospheric pressure P, or the maximum stack depth of linguistic states required for a pushdown automaton to parse a given sentence)
- There are parameters that describe individual sets of data and all of these parameters are within a certain range of each other (ie. basal metabolic rate of individual atheletes, fuel efficiency of different cars on a test track).
- There is no stable parameter value that describes a given physical process (ie. basal metabolic rate of couch potatoes who first start training for a marathon, fuel efficiency of a car with an intermittent sensor problem, or if you believe that Tracy and Beall paper Andrew Gelman likes to use as an example, the shirt color preferences of women through time).

When Shravan says that his Bayesian models are unreproducible, what does that indicate?

Most likely it indicates that he’s created a model like (1) where he hypothesizes a universal facet of language processing, and then when he fits it to data and finds his parameter value, if he fits the model to different data, the parameter value *isn’t similar* to the first fit. That is, the universality fails to hold. If, in fact, he were to use the posterior distribution of the first fit as the prior for the second, he’d find that the posterior widened or maybe shifted somewhere that the posterior from the first analysis wouldn’t predict when the additional data was taken into account.

Or, maybe he’s got a model more like (2) where he hypothesizes that at least there’s some stable parameter for each person but that all the people are clustered in some region of space. But, when he measures the same person multiple times he gets different parameter values for each trial, or when he measures different populations he winds up with different ranges of parameter values.

Or, maybe he has a model like (3) where he expects changes in time, but the kinds of changes he sees are so broad and have so little structure to them that they aren’t really predictive of anything at all.

In each case, probability as extended logic gives you a clear mechanism to detect that your hypothesis is problematic. When you collect more data and it fails to concentrate your parameter estimates, or it causes the parameters to shift outside the region they had previously been constrained to, it indicates that *the model is inconsistent and can’t* explain the data sufficiently.

## Bayesian models can be structured to be valid even when the data isn’t an IID sample:

**IID sampling is just way more powerful mathematically than the reality of running real experiments**, so that assumption implicit in a Frequentist test, will fool you into failing to realize the range of possibilities. In other words, Frequentism relies on you throwing away knowledge of alternatives that might happen under other circumstances (“let the data speak for themselves” which is sometimes attributed to Ronald Fisher but may not be something he really said)

Suppose instead you are analyzing performance of a linguistic test in terms of a Frequency distribution of events. The basic claim is that each event is an IID sample from a distribution $$F(x ; a,b)$$ where x is the measured value, and a,b are some fixed but unknown parameters. Now, you do a test to see if $$a > 0$$ and you reject this hypothesis, so you assume $$a < 0$$ and you publish your result. First off, the assumption of a fixed $$F$$ and IID sampling is usually a huge huge assumption. And, with that assumption comes a “surety” about the past, the future, about other places in the world, other people, etc. **In essence what that assumption means is “no matter where I go or when I take the measurements all the measurements will look like samples from F and every moderate size dataset will fill up F“.** How is this assumption different from

**“no matter where I go all I know about the data is that they won’t be unusual events under distribution P” which is the Bayesian interpretation?**

Consider the following specific example: P = normal(0,10), and I go and I get a sample of 25 data points, and I find they are all within the range -1,1. A Frequentist rejects the model of normal(0,10). Any *random sample* from normal(0,10) would contain values in the ranges say -15,-1 and 1,15 but this sample doesn’t.

Does the Bayesian reject the model? No, not necessarily. Because the Bayesians know that a) the probability is *in their head* based on the knowledge they have, whereas the Frequency is *in the world* based on the laws of physics. And, b) in many many cases there is no sense in which the data is a representative random sample of all the things that can happen in the world. So, even if Bayesians really do believe that $$P=F$$ is a stable Frequency distribution across the full range of conditions, there’s nothing to say that the current conditions couldn’t be all within a small subset of what’s possible. In other words, the Bayesian’s knowledge may tell them only about the full range of possibilities, but not about the particulars of how this sample might vary from the full range of possibilities. Since Bayesians know that their samples aren’t IID samples from $$F$$ they need not be concerned that it fails to fill up all of $$F$$. The only time they need to be concerned is if they see samples that are far outside the range of possibilities considered. Like say x=100 in this example. That would never happen under normal(0,10);

To the Bayesian, x ~ normal(0,s); s ~ prior_for_s(constants) is very different than x ~ normal(0,constant); In the first case the scale s is unknown and has a kind of range of possibilities which we can do inference on. The assumption is implicitly that the data is informative about the range of s values. In the second case, the constant scale comes from theoretical principles and we’re not interested in restricting the range down to some other value s based on the data.

This can be important, because it lets the Bayesian impose his knowledge on the model, forcing the model to consider a wide range of possibilities for data values even when the data doesn’t “fill up” the range. Furthermore, it’s possible to be less dogmatic for the Bayesian, they can provide informative priors that are not as informative as the delta-function (ie. a fixed constant) but also will not immediately mold themselves to the particular sample. For example s ~ gamma(100,100.0/10) says that the scale is close to 10 and this information is “worth” about 100 additional samples, so don’t take the 25 samples we’ve got as fully indicative of the full range.

The assumption of *random sampling *that comes out of Frequentist theory says that there’s *no way in hell* that you could fail to fill up the Frequency distribution if your samples were 25 random ones. This means implicitly that the current data and any future data look “alike” for sufficiently large sample sizes. That’s a very strong assumption which *only holds* when you are doing random sampling from a finite population using an RNG. But all of the Frequentist tests rely on that assumption. Without that assumption you simply can’t interpret the p values as meaningful at all. If you know that your sample is not a random sample from a finite population but just “some stuff that happened through time where people were recruited to take part in various experiments in various labs” then assuming that they are random samples from a fixed population says automatically “when n is more than a smallish number, like 25 to 50, there is no way in hell we’ve failed to sample any part of the possible outcomes”

Since the p values you get out of Frequentist tests are typically only valid when the data really IS an IID sample of the possibilities. You’re left with either being fooled by statistics, or being a Bayesian and imposing your knowledge onto your inferences.

For an example of how Frequentist models can go seriously wrong, consider the Fukushima disaster.

Assumptions that each year the maximum tsunami recorded is an IID sample from some population will tell you that after say 100 years you have a very good idea of the distribution F for all tsunami that will ever happen.

An alternative calculation could be done for example, where you look at all the energy releases from earthquakes whether they’re under water or not, and then you translate that energy release directly into the gravitational potential for a wall of water N kilometers long. Doing that you might well see that in the past, energy releases of E equivalent to say a 25 meter wall of water 100 km long have been observed, but maybe not under the ocean in such a way as to actually form a Tsunami.

Now, the Bayesian can say “I’ll accept the idea that Tsunamis could occur up to 25-30 meters high with moderate probability” whereas the lack of such a high Tsunami in a 100 year record would be taken under IID sampling as much stronger evidence of their infrequency.

But, Tsunami are not IID sampled, a better probabilistic model would be something like a Markov chain, since physics has a Markovian style dependence, what happens in the future is dependent on the current state. If we see a bunch of samples from a Markov chain, without any knowledge of how easy it is for that chain to move around the full distribution, we can’t know whether some day it will move into a new range of possibilities. But physical conservation laws and soforth DO provide us with realistic constraints, so the calculation in terms of gravitational potential has a kind of connection to the range of possibilities that observations of frequencies just don’t.

My complaint here about IID sampling is more or less the essence of Taleb’s “Fooled by Randomness” complaint. The data doesn’t fill up the distribution except in the infinite time limit, and by infinite time, we don’t mean 100 years, we mean, maybe 4.5 Billion. At any moment, sampling through time, we could move into a new regime, a new climate, a new mass extinction event, visitations by aliens, the discovery of economically feasible nuclear fusion, a cheap solar energy source, a non-volatile, non-wearing, extremely-low-power-consuming, high speed random access memory that obviates the need for spinning disks… a whole population of swans that are dark in color, whatever.

Over at Gelman’s blog, Keith points out that we always get concentration if we force the Bayesian machinery to assume a single global parameter (as in my case 1) but I also point out that it need not concentrate on a consistent place. The nice part about Bayesian machinery is that you have a single consistent rule for updating from any amount of data. so for example if you model something as normal(mu,sigma) and it’s really normal(t,1) so a linear function of time, if you take 10 samples at t=0, you’ll concentrate around mu=0,sigma=1

taking an additional sample of 10 at time t=10, relative to the original posterior which is your new prior, you’ll get a new update for mu = 5, sigma =10 or so!!!

It’s possible to detect inconsistency by looking at posterior parameters relative to the prior, and showing that we’re surprised, there’s some element of breakage in our model.

In a frequentist analysis, you’d be able to easily reject the idea that the two samples were from the same distribution, so it’s not that they somehow can’t get the right answer in a frequentist analysis. Frequentists frequently figure out ways to get the right answer. The advantage of Bayes is that it automatically does logical calculations that give you a full logical view of what your model means, even if that model is broken!

Who told you that frequentists can only do IID and can’t use Markov Chains, time series and all kinds of other models for dependence and non-identical distributions?

Actually, the IID assumption with unknown parameter for frequentists corresponds to exchangeability (equivalent to IID conditionally under the parameter) for Bayesians. Both of these are often assumed but by no means necessary.

I think we have to distinguish between people who consider themselves Frequentists, and analyses that really are Frequentist. So in a state-space markov chain timeseries type situation, imagine we are trying to put a probability measure over say monthly observations of a state of some system for 5 years. Suppose there are say 10 possible states. So each time series is a vector of 60 observations. The number of possible vectors is 10^60, the number of possible transitions is maybe less, let’s say there’s an average of 4 transitions out of each state, so maybe 4^60 = 1.3e36 possible transitions. You observe 20 or 40 or 100 or even 2000 timeseries (let’s pretend they’re people transitioning between different states of health or economic conditions).

In what sense can we justify having any kind of Frequency distribution over these vectors of 60 states? It’s an imaginary fiction. The fact is, when you set up a moderate sized Markov Chain problem like this you’re doing a Bayesian analysis because the only sense in which these distributions exist is in your head.

It’s a fiction, all right. I never denied that, and personally I don’t have any bigger problem making sense of this fiction (which actually works as defining a point of view from which data can be connected to some unknown reality about which we’d like to find out) than of numerical results of analyses the meaning of which is only implicitly defined by some axiomatic system and/or is interpreted by some people by equally imaginary allusions such as “counting all the potential worlds in which X is true”.

I’d personally not call an analysis Bayesian that doesn’t use Bayesian updating starting from a prior, but you are of course free to do that if it pleases you.

“I’d personally not call an analysis Bayesian that doesn’t use Bayesian updating starting from a prior”

So, how do we fit this example model? If we try out a variety of parameters and reject those that make the data have a low p value based on some test statistic, then I guess we might be doing a Frequentist analysis. But typically what’s done is maximum likelihood, and in that case, we’ve got a distribution over vectors (a likelihood) that represents our knowledge of what might be reasonable, not what the frequencies in repeated sampling of 60 state vectors is supposed to be, and we’ve done the Bayesian math to find a maximum posterior probability estimate based on a prior (yes, a prior not necessarily explicitly mentioned, but a nonstandard flat prior over each parameter nevertheless) so it IS a Bayesian update, it’s just one where a lot of useful information about the posterior is thrown away.

And in all of this, I think that’s the point I keep trying to hammer on. Lots of people think that Frequentist methods are perfectly fine for lots of problems because they can point to these sophisticated “Frequentist” examples. But these examples tend to 1) Choose distributions that basically describe some kind of knowledge about the process, not a validated stable histogram of repeated outcomes and then 2) Find the parameters by doing the Bayesian math and finding the highest posterior density estimate, using a flat prior.

You can’t point to that and say “See Frequentist Methods Work!” It’s like pointing to a steel structure that arrived from the fabrication shop with heavy coats of paint on it, and saying “see I didn’t put any paint on it and it’s not rusting at all! unpainted structures are just fine!”

For me frequentism refers to a way to interpret what probabilities are, not to a specific way to analyze data. On the contrary, I’d use the term “Bayesian” rather for certain ways to analyze data (Bayes himself was very brief and somewhat ambiguous about his interpretation of probability). There is more than one interpretation of probability around among people who call themselves Bayesians, and Bayesian methods can well be paired with a frequentist (or related; one can debate what exactly the interpretation is and whether in many cases it should better be called “long run propensity” rather than “frequentist”) interpretation of probability. I’ve read many papers in which Bayesian data analyses, choice of prior and “sampling model” etc. were explained in a way that suggested that the authors thought about these in frequentist ways rather than Jaynesian or de Finettian. This is all fine by me as long as they are aware of this, as is the use of hypothesis tests and p-values where they make sense.

Regarding this idea that Maximum Likelihood and the like is “really” Bayesian and the prior is just implicit – well, a prior gives you a layer of flexibility more for writing down things so obviously you can technically reconstruct some simpler stuff as a special case of the more complex system you like and I can hardly object. Except that it doesn’t make something that was frequentist before no less frequentist because these are just not opposites in my view, and calling something Bayesian that doesn’t make any explicit use of what Bayes actually added to probability theory just isn’t particularly enlightening.

See, I’m not objecting against the analyses that you seem to favour here. I’m objecting against the unnecessary dogmatic (and potentially ill-informed) frequentist-bashing that you do. By and large I perceive your blog as intelligent and insightful if a bit dogmatic and philosophically biased, and I’d like it better without the last bit.

To me, it helps in modeling if you have a point of view that organizes your thoughts. Prior to about 5 years ago I did lots of what you might call “classical” or “the usual stuff” type statistics. A little kernel density estimation, a little testing, a little least-squares regression, a little maximum likelihood stuff… And as I poked at it all more it seemed to have poor organizing principles.

So part of this blog, and I don’t want it to be the only part, is to discuss the organizing principles behind different kinds of analyses.

I think it helps to figure out what is the principle that makes us think that a particular analysis works, and it could help people who do these “classical” analyses to realize that they’re already basically doing Bayesian analyses, because then they can actually engage fully with that thought process, including using informed priors and choosing alternative likelihoods based on Bayesian principles. So that’s more or less the reason why I spend time on that issue. I think there are lots of other good blog topics though so maybe I’ll spend a little time on some other ones because that is actually a goal of mine, to keep the blog diverse.

That’s all fine by me, but you could do this without making rubbish claims such as “But all of the Frequentist tests rely on that assumption.” Not so.

A Frequentist model will always be rejected when a sufficiently large sample is taken and the sample doesn’t fill up the high probability region. Now, for high dimensional data, the sample size required may grow enormously, but it’s still the case that you can’t justify a model on Frequentist grounds if it predicts samples in some region of the space, and they don’t occur. That’s *what it means* to have a Frequentist model, that probability = frequency in very large samples.

However, on Bayesian grounds it doesn’t invalidate the model if there’s moderate probability for some region x and yet x never occurs. That’s the point of the statement you’re quoting. A “test” is a way of checking whether some Frequency assumption is violated, they compare data to the output of random number generators. The logic of Bayesian modeling doesn’t require that the data fill up the distribution, only that it not be in the extremely unusual portion of the distribution. Those are the fundamental organizing principles behind Frequentist and Bayesian models.

We should never believe that the world is exactly like out models. I think that your criticism of frequentism is based on the assumption that a frequentist must believe that the model fits the world perfectly.

There’s no use to test a frequentist model for deviations that are not relevant for what we want he model to achieve. Obviously all observed data are discrete so you can design a test that rejects every continuous model (the issue is the same for Bayesians but doesn’t play out in terms of tests there). But we don’t do that, because this mismatch between model and reality is not normally relevant to the kind of conclusion that we want to draw from the model. Same if we have large amounts of data. You can easily find a test that rejects a parsimonious model but there’s no use in doing that. We’d want to detect some deviations from the model that would lead us to misleading conclusions in case we continue using the model despite of them; other deviations can be tolerated.

“Bayesian modeling doesn’t require that the data fill up the distribution, only that it not be in the extremely unusual portion of the distribution.” But this can easily happen and happens regularly in cases where people assume exchangeability but in fact there is dependence . (I’m using the term “dependence” here in a rather intuitive manner; for the Bayesians I should probably add “conditionally on the parameter” but then the Bayesian may deny that there’s any “true” distribution to which “in fact” could apply, so I mean something like “it behaves as if you’re generating data from a frequentist model with dependence that is big enough for making a difference”.)

“Same if we have large amounts of data. You can easily find a test that rejects a parsimonious model but there’s no use in doing that.”

But this is one of the major flaws in testing based inference. We reject a model with a test. How do we know whether it’s because “there’s no use in doing that”, or the model really is a bad one? What tells us that “there’s no use in doing this” vs “we should start again?” there is no logic to it, just guesswork. Typically the guesswork leads closer and closer to a Bayesian answer, because the Bayesian answer answers a real question: “How much do we know about the process?” not an imaginary question “how often would things happen if we continued to repeat this process indefinitely?”

The fact is that the core philosophy of Frequentism is that there is a “real” distribution which describes what would happen after a large repeated sampling experiment. All tests are based on the notion that we can reject stuff based on it not producing the right frequencies of some statistic.

The entirety of bootstrapping is predicated on my notion of “a moderate sample will fill up the real distribution” for example. If there’s some region of space that hasn’t occurred in your sample but could easily occur in the future at some point… Bootstraps will give horrendous results.

Typically, testing either corresponds to a Bayesian calculation, or it answer the wrong question. The main exceptions are where we’re talking about trying to construct a reliable random number generator machine or computer program. The “die harder” tests are exactly the kind of thing that’s needed for their special purpose.

Frequency based inference relies on testing. The testing in essence asks the question “is this data a low-frequency occurrence under some assumptions?”

Bayesian inference relies on likelihood, it asks “under what assumptions is this dataset relatively high probability?” of course it defines “relatively” in terms of a trade-off of prior probability and data likelihood, and it defines probability in terms of “isn’t too surprising to us” not “how often stuff happens in an infinite future string of occurrences”

The part that gets confused is when people use likelihood based inference and think of it as Frequentist. It isn’t. Likelihood ratio tests may be, I haven’t thought too hard about that, but likelihood based inference on parameters *is* the Bayesian thing to do (when you have flat priors).

You’re still apparently in a place where things are Bayesian to the extent that they’re not frequentist and the other way round. Not in my book, see earlier.

“How do we know whether it’s because “there’s no use in doing that”, or the model really is a bad one?”

Well, I hope we know what you use the model for, and what precision is required. I also hope we know that some violations of model assumptions are problematic for certain computations and others aren’t (heavy tails will destroy a sample variance, light tails won’t).

You apply the same thinking when doing Bayesian modelling. For the very same reason for which you understand that testing continuity of a frequentist model doesn’t make sense, you’d also use a continuous Bayesian model despite the fact that you know that all data are discrete. These considerations are not so mysterious after all, only they are hardly ever spelled out.

“The fact is that the core philosophy of Frequentism is that there is a “real” distribution which describes what would happen after a large repeated sampling experiment.” For me that’s a thought construct, an idealization. Pretty much everyone knows this when pressed although admittedly many tend to forget it at times. The model should fit the process in some relevant respects, not in all respects. Again it’s the same with Bayesian modelling.

“The testing in essence asks the question “is this data a low-frequency occurrence under some assumptions?” ”

I’d say that the testing asks the question “Are the data compatible, in the sense defined by the test statistic, with the model?” The statement that the data are indeed a low-frequency occurrence under the model can never be confirmed. That’s not what testing is about.

“because the Bayesian answer answers a real question: “How much do we know about the process?”” – well, in order to get Bayes started, you need the answer pre-data in numerical form already before you start observing and analysing data.

A model is Bayesian when it uses Bayesian math and Bayesian reasoning, that is a joint distribution p(Data, Parameters | Knowledge) which is usually written as a likelihood and prior: p(Data | Parameters, Knowledge) P(Parameters | Knowledge). If it uses a flat prior on the parameters and just the maximum probability value of the resulting distribution, it’s still Bayesian, just using a limited version of the full Bayesian reasoning. That is, you’ve chosen to throw away some flexibility. Sometimes people make this choice for the wrong reasons. They think somehow that it makes them “less subjective” or “non Bayesian” as if “Bayesian” is a wrong thing and requires avoidance. This isn’t always the motivation, but sometimes.

In particular though, this model doesn’t reject anything based on Frequency grounds. The question is just can we find some parameter values in the high probability density region of the resulting posterior distribution and do they correspond approximately to reality?

A model is Frequentist when it uses the frequency properties of sampling from a distribution for inference, accept or reject at a certain p value to form a confidence interval, and you’re doing Frequentist inference. Choosing to regularize by restricting your models to various families a la Laurie Davies’ examples on Gelman’s blog, and you’re still fitting a random number generator to your data so that you can produce some “fake data” and see if it’s “like” your real data when the parameters have various values.

If you believe that your goal in choosing a model is to approximate the frequency properties of your data to within a sufficiently good “epsilon” which need not be zero, but should be in some sense “small” then you can’t agree with my orange juice example, because it was specifically chosen to FAIL to approximate the sampling distribution in a very obvious way:

http://models.street-artists.org/2014/03/21/the-bayesian-approach-to-frequentist-sampling-theory/

going in to the analysis we know that there’s an upper bound on what an orange juice carton can hold that’s in the vicinity of say 2.5 liters or less, and yet intentionally, I chose the exponential distribution which has density out to infinity and nontrivial density out to 4 or 5 TIMES the maximum that we know can fit. The math nevertheless works out and gives acceptable inference in a Bayesian sense. A whole variety of frequentist tests would reject this exponential distribution.

Is it possible to get good frequentist inference from the exponential distribution in this case? I’m sure you could post-hoc knowing that the Bayesian calculation produces OK inference, concoct some way to make it work sufficiently well for a Frequentist inference procedure.

To get “good” defensible Frequentist inference, you’d have to take advantage of the random selection and probably do something like bootstrapping, or kernel density estimation or choose a class of models and test the parameters to see if the sample was unusual under those parameters to get a confidence interval.

You can lecture me about your Bayesian ideas and that’s fine by me because I’m actually a pluralist rather than a frequentist, and I appreciate your thoughts on this. But I’m not going to let you lecture me on what frequentists have to do, have to believe etc. I have explained above my understanding of frequentism and that a frequentist doesn’t necessarily have to use p-values and can well use Bayesian methodology (and certainly goes beyond iid models), and also that a frequentist (and not only those) model is a thought construct that only needs to reflect reality as far as we use it to make statements about reality, and therefore only needs to mimic the data in certain respects but not in all. Frequentism has a lot of problems but many of them have a correspondence in Bayesian statistics and Bayesian statistics has some of its own. I cannot stop you from doing the kind of frequentist-bashing that you’re keen to do, so I will leave this discussion for now.

So at this point i think it’s clear that we’re talking past each other about the definition of words rather than substantive ideas, so perhaps it’s fine to drop the discussion at the moment. I need a word to refer to “analysis which relies on testing whether the frequency of some outcome is unusual under a model” and so I use Frequentist for that kind of analysis. I don’t really apply Frequentist and Bayesian to people except in a short-hand colloquial way. People are of course free to think about different problems in different ways. I’ve certainly done frequentist analysis when I thought it made sense (particularly when I had randomized samples or when I had a computer procedure that selected things from a large set and wanted to find out if that selection was unusual under some kind of “uniform random” or other fairly defined selection criterion).

It’s really a kind of logic or thinking or analysis which is in question here, not what a person “truly believes” etc. Does an analysis rely on the frequency properties of a distribution to define the mathematical operations that it entails? Then it has a different structure than if your analysis relies on a measure of plausibility. We need words to describe both things. The standard ones are Frequentist and Bayesian, but these also get used in more general contexts so there’s ambiguity perhaps.

It’s perfectly possible for a single person to be as you say pluralist, and they need not actually believe certain stuff, when I say “a frequentist has to believe …” this is really short hand for “the logical consequences of doing an analysis in which a model is taken to represent a set of repeated samples with a stable frequency distribution and a test is performed to determine whether that assumption can be rejected are in part…” You can see why that gets abbreviated.

I think it’s important to explore the logical consequences of taking a model in hand and trying to find out something in the world from it, and the logical consequences are different depending on whether you’re testing the frequency properties of a functional of a distribution under a random sampling assumption, or determining the degree of reasonableness associated using Bayesian mathematics.

One question of interest to practitioners is “what the heck am I doing anyway?” because many people are taught procedures and formulas and soforth and don’t think about what is the structure underlying those ideas. From that perspective, considering whether “something that is typically done” is “a restricted kind of Bayesian analysis” or “a test-based analysis assuming a frequency distribution” is a useful compare-and-contrast exercise.

I said that I wanted to leave the discussion, at least for now, but let me write anyway that I like your last posting and I agree with much in it, particularly this now doesn’t sound anymore as if you believe you have the ultimate authority to define what the terms “frequentist” and “Bayes” “really” mean. You can do that for yourself, of course, but it may not match how others use it… I know that my own use of the terms is not necessarily shared by a majority so I face this problem as well. We’re not talking math here so we need to acknowledge that such terms come with some ambiguity, and discussing their definition is partly “just about use of words” but also comes with some philosophical implications that may have an impact on practice.