Product(p(Data_i | Perturbed_Data_i, model_error_size), i)

say using a normal model for the "model error"

This model could then be included in a mixture model in which the probability associated with it is a parameter, whose hyper-prior assigns relatively low probability, and if it winds up having high posterior probability it's an indication that your predictive models are not very good compared to how good you think they should be.

]]>p(Data | Model1) p(Model1) + p(Data | Model2) p(Model2) + ...

This is a consistent way to analyze them and compare them together. The question is, what if none of the models are "adequate" ? For example, none of them have any portion of the parameter space where they're particularly predictive. I think to deal with this you have to remember that in pretty much every situation, you're working with a truncated version of some ideal scientific process. You're always working with a small finite set of predictor variables, and a small finite model specification, and limited computing time, etc. Some times it just makes sense to go back to the drawing board and seek out new models.

One question is how do you detect this situation? In the absence of a specific model you can include in the Bayesian calculations, there's no way to just put in a generic "something else". I think Gelman has some useful ideas about using generative models with high-posterior-probability parameter values to compare fake-data with real-data, and soforth. Those techniques *do* help. Graphical techniques for model-data comparison definitely help. I definitely don't have a complete theory of model checking to offer. In many cases I've just wound up with a model that isn't terrible and work with it until something better occurs to me. In part, it depends on what your needs are, how the model is getting used.

So, in this sense, model comparison can be done within Bayes, but model "adequacy" really has to be done outside Bayes.

]]>Now, given a single probability distribution (state of information) we can make inferences 'within' this single probability distribution. Eg calculate conditional probabilities given a single global probability distribution.

So, what would lead you to reject the single overall joint distribution as 'not good enough'? How would you compare two competing full joint distributions or compare either to reality?

]]>The fact that priors are not universal, that is, that people can come up with different priors from a similar state of information is well known, so we still need some "scientific reasoning" outside probability theory, which I think everyone admits (all true Scotsmen at least ðŸ™‚ ). But Godel gives us a reason to think that no theory is complete in and of itself which means we shouldn't be surprised by this fact, and reading Pinter's book, it's just taken for granted in modern set theory that if we want to make correspondences between mathematical statements and the world around us, that occurs outside our formal system, because a formal system is just a very limited language about symbolic formulas. So in some sense, the point of science is to figure out which models allow you to make logical deductions which then also turn out to be valid truths about the universe (ie. the model is good at predicting what happens in the world). The model formulation, and decisions about what to measure, and how to measure them etc occur outside probability theory, but if you want to account (with a single real number) for how much the weight of all your knowledge is for or against a particular statement about the world being true, you should use Bayesian updating to get a consistent system.

]]>I think the goal of "formalizing all scientific reasoning" is a non starter (Thanks Godel!) so for me Cox's axioms work to say "when you can give probabilities for base objects, and you add some data, here's how you should give new conditional probabilities, and they will be as consistent as possible (ie. as consistent as ZFC, thanks to the bit Corey pointed out above). It's a specification of the algebra you should use to think about real-number credence/plausibility/probability assignments. It's like telling all the accountants "use the rules of arithmetic". Sure, it's still the case that we need "good" accounting practices on top of that, but it's no good if some accountants just don't follow the arithmetic rules.

I think we've established that it's not logically required that you MUST accept R1-R5 either. So, basically the contents of Cox's axioms are "if you're going to work with probabilities to help you do the accounting of what is and isn't likely to be true out of a big database of facts... then you need to use the sum and product rules"

That's not a theory of everything, but it's not nothing either!

]]>A state of information X summarizes the information we have about some set of atomic propositions A, called the basis of X, and their relationships to each other. The domain of X is the logical closure of A, that is, the union of A and all compound propositions that involve only atomic propositions from A.

I take this to mean, we start with a bunch of propositions A, and then everything that A implies gets computed (theoretically), and then the stuff that is relevant to assigning probabilities is distilled out of it and that's the state of information used to assign the probabilities. So it's not the same as a set of simple propositions, but it is basically "everything that the atomic propositions imply logically about probabilities of stuff on the left side of the conditioning bar"

So, Cox/Bayes probability is the unique real-number system that satisfies R1-R5 for starting with some propositions about probabilities, and some data, and winding up with some new propositions about probabilities.

]]>