# Feynman on Physical Law (apropos of Cox’s Theorem)

I know, I know, everyone either loves Feynman or finds him pompous… but I do think this clip has something useful to say about what makes science different from all the other ways we might try to make sense of the world. And, in doing so, it helps me make the point about what part Cox’s theorem and Bayesian probability theory play in science.

**Guess** is more or less his starting point. From a formal perspective this means somehow we specify some formal mathematical model that predicts the outcomes of experiments. We might also specify several competing models. We need to write down these models in a formal system, in modern terms, we need to program these models unambiguously into a computer. The basis of our formal systems is Set Theory.

**Compute the consequences **more or less this is straightforward if you are given some numerical values of certain quantities, the heavy lifting here is done by the theory of functions, computation, calculus, differential equations, numerical analysis, etc. But, we need those numerical values of the quantities. The Bayesian solution is to specify a state of knowledge about what those quantities might be, a prior distribution, where some values have higher probability than other values because we think those values are more reasonable (*not more common under repetition*, just more reasonable in this case)

**Compare Predictions to experiment **to a Bayesian this means see which values of the certain quantities are needed to make the outcomes of the experiment be “close” to the predictions, and not only that, but also see how high in that probability distribution around the prediction they will be (how precisely the best version of the model predicts reality). When there are several models, the ones that put the outcomes in a high probability region are going to get higher post-data probability, that is, we are automatically going to put probabilities over our several models based on how well they predict.

What parts are special to science? Certainly *guessing* is not unique to science, the history of Greek, Norse, Native American, and other mythologies shows that we like to guess a lot.

*Computing the Consequences* isn’t unique to science either, Acupuncture / Chinese Medicine is largely guessing explanations in terms of Qi and hot vs cold foods and meridian lines and so forth… and then computing which spices or herbs or special tinctures or needle locations are recommended by the model.

** Compare Predictions to Experiment**: is really the essence of what makes science special, and in order to do this step, we need a meaning for “compare”. The Bayesian solution is to force the model builder to use some information to specify how well the model should predict. In other words, what’s the “best case”? Specifically the quantity p(Data | Params) should be taken to be a specification of how probable it would be to observe the Data as the outcomes of experiments, if you knew *precisely* the correct quantities to plug into Params. The fact that we don’t put delta-functions over the Data values reflects our knowledge that we *don’t* expect our models to predict the output of our instruments *exactly*.

So, what does Cox’s axioms teach us? It’s really just that if you want to use a real number to describe a degree of plausibility about anything you don’t know, and you want it to agree with the boolean logic of YES vs NO in the limit, you should do your specifications of degrees of plausibility using probability theory.

Cox doesn’t tell you anything much about how to Guess, or how to Compute The Consequences (except that you should also compute the probability consequences in a certain way), but it does have a lot to say about how you should Compare Predictions to Experiment.

Comments are closed.

I really like this post.

Just to be annoying – I agree with Feynman’s sentiment. I agree you can use the informal Cox/Jaynes approach in this sort of way.

But there are many other ways of implementing Feynman’s advice. My issue is not that pragmatic Bayes isn’t a good thing, it’s that people claim rigorous support for it from things like Cox’s theorem. I don’t see it providing this justification. There are massive gaps in what it says, how that bears on foundational questions, and what people take it to say.

It seems like logical/philosophical window dressing used to sell an approach. Like a shampoo commercial or something. You question the basis and the response is the equivalent of ‘yeah but you just wash your hair with it and it works fine, probably won’t kill you’. I use shampoo too, but it’s just shampoo. I probably took this metaphor too far…

The thing is, there are not other ways of implementing Feynman’s advice that involve assigning a densely gradated degree of support for truth that also agrees with 0,1 true/false boolean logic in the limit. That’s the value of Cox’s theorem. It’s a uniqueness result for the algebraic properties of such a system.

That’s like saying there are no other ways of skinning a cat that involve skinning a cat in this particular way…and also that ‘this particular way’ may not actually refer to skinning a cat, but maybe just sharpening knives or something!

Anyway, I suppose we aren’t going to agree anytime soon.

I think you’re right, we’re not going to agree, and the fundamental thing seems to be that I really do WANT a system for assigning plausibilities that agrees with 1/0 boolean logic, and you aren’t really so committed to that basic idea.

To stick with the cat skinning metaphor, it’s like saying “there are no other ways of skinning a cat that consistently produce a complete skin with no punctures that can be effectively tanned other than to use a skinning knife” and you’re saying “gee but this guy over here effectively removes the skin from cats in thin strips”

If you want something else, then Cox’s theorem isn’t that helpful, but if you want plausibility, Cox’s theorem tells you stop fiddling around with NHST and p values and frequencies, because frequencies and plausibility are not the same thing, even though they confusingly have the same kind of algebra.

Oh well, I’m sure we could agree on other things!

But again, Cox’s theorem doesn’t say that – Fisherian p-values refer to the adequacy of a statistical model, which you’ve said isn’t covered by Cox, and Cox’s theorem does relate to frequencies (of propositions), e.g.

“In fact, Cox pointed this out in his 1961 book The Algebra of Probable Inference, quoting Boole in Footnote 5, p. 101. In this passage, Boole not only makes the connection between the frequentist and logical interpretations of probability, he suggests that it is necessary—which is the point of Cox’s Theorem.”

(from the meaningness post).

Statisticians of all stripes accept (or should accept) probability applied to simple propositions. The question is what can and can’t be represented by simple propositions.

For example a ‘state of information’ is not a collection of propositions, as discussed. It does allow you to assign probabilities to propositions, i.e. it is a probability model.

One could use a Fisherian p-value to make a statement about the probability model (state of information) itself, rather than propositions within the model. In fact I think this is the most sensible use of p-value style reasoning (whether formal or informal).

I think the biggest problem with p values as a measure of a model’s adequacy is that p values are measures of the adequacy of a frequency distribution to describe the correct frequencies with which things occur, whereas Bayesian models are measures of plausibility to describe what a person who accepts the state of information (even if only for the purposes of argument, not as some subjective truth inside their soul) should think about the plausibility of statements post-data.

As Jaynes says “It is therefore highly illogical to speak of ‘verifying’ (3.8 [the Bernoulli urn equation]) by performing experiments with the urn; that would be like trying to verify a boy’s love for his dog by performing experiments on the dog.”

So, when you’re doing inference on a frequency distribution, you can then figure out the adequacy of your posterior distribution by comparing the frequency distributions you find to the data they supposedly generate using p values. But when you’re not doing inference on a frequency distribution… it just makes no sense to use p values, it’s a categorical error, like seeing if the temperature of a frying pan is at least 3 meters.

P-values and similar reasoning have nothing to do with ‘verifying’ anything. And the point is not to match arbitrary frequency distributions, but to choose appropriate statistics. I’m no frequentist, but I think this is another topic Jaynes was confused on.

Per Martin-Lof *defined* a sequence of numbers as a random sequence if some “most powerful computable test” gives a sufficiently large p value, so that it can’t be rejected as random (he gives non-constructive proof of existence of such a test). This was shown to be equivalent to Kolmogorov’s version.

So, by this definition, the p value “verifies” whether a sequence of number is a random sequence from a given distribution or not.

See post here: http://models.street-artists.org/2013/03/13/per-martin-lof-and-a-definition-of-random-sequences/ which has a link to the paper.

By this definition, the p value MUST be useful for “verifying” that a sequence is from a given frequency distribution. However, by this definition there is absolutely NO reason to think that it verifies that a plausibility distribution is “adequate”. For “adequate” you need some kind of notion of “goodness” and the p value is only a notion of “goodness” when you’re searching for a model of frequencies.

Random with respect to a test, as in my reference to choice of statistic. Also still nothing to do with ‘verifying’ in the sense I used it. Arguing with Bayesians about Frequentism is as frustrating as arguing about Bayes with Frequentists…I really need to stop

Well, early on in the Cox post you sort of promised an argument in the future but then only gave a few quick preliminary comments, pointed to a few other people’s blogs, brought up some important issues and etc but never put it together into what I’d call a complete philosophical position.

I think if you can put some specific logical argument for a particular position of how it all works, I could better understand where you’re coming from. So far I think you agree with me that Cox’s theorem provides a logical basis for updating degrees of plausibility over factual statements. Then, you seem to say that outside of factual statements, there are additional questions that can’t be answered, and I agreed, specifically I put forward a framework from Feynman of “guess”, “compute”, “check” and agreed that both guess and compute have little to do with Bayesian probability theory (except that if you accept Bayesian theory, you should compute probability consequences in a certain way). You seem to have other issues of interest, but you haven’t really said what those questions are, or how you think they should be answered. Or, if you have, I at least haven’t understood the specifics.

Fair enough. To summarise my basic, not especially constructive position:

It is perfectly possible to accept Cox’s theorem as stated by Van Horn, accept the guidelines given by Feynman and consistently use non-Bayesian statistical methods.

That is to say, the issues addressed by Cox/Van Horn appear to relate to statistical inference, but in fact address narrower questions of probability theory and propositional logic that should be acceptable to anyone using probability calculations.

On the other hand, the issues addressed by Feynman are sufficiently broad to accommodate a wide range of positions on statistical/scientific inference methodology.

So, while interesting, none of these are arguments *for Bayesian statistics or Bayesian epistemology etc as opposed to a number of alternatives*.

Now, do I have a better, fully-worked alternative? Nope! But I’ll let you know if I ever get one 🙂