There is some concern in the back of people's mind about Cox/Jaynes Bayes and things like continuous parameter spaces, or infinite dimensional models and soforth. It comes down to questions about Finite Additivity. About the only good thing for me that came out of Dan Simpson posting on Andrew's blog about the "Gay Face" neural net baloney that came out this week is that he indirectly pointed me to a guest post on Xian's blog from a few years back. (To be clear, it's the gay face "research" baloney not Dan's post that I object to)

So, here I want to try to organize some thoughts I have on finite additivity type stuff. I've mentioned why I don't think this is actually an issue for science in the past.

As regular blog readers know, I'm a fan of the IST form of Nonstandard Analysis. I think it connects models to formalism in a way that is transparent, and in a way that "typical" measure theory type math doesn't.

So, let's try to define some probability foundations in a Nonstandard Analysis setting (warning, I will probably do a poor job of this compared to professionals but if I screw it up I am pretty confident it will be both detectable and fixable by a more serious formalist)

First, let's work with a simple continuous valued random variable. $X$

Define a set of points $\{x_i : -N + i dx\}$ where $dx = 1/N^2$ and $i \in [0,1,...,N^2]$ and $N$ is a nonstandard integer.

Now define a function $p(x_i)$ which is non-negative and $\sum_i p(x_i) dx = 1$

Suppose the standardization $p^*(x)$ exists as a standard function, and therefore $p^*(x)$ is a standard probability density function, proof is simple, it's non-negative, it's integrable, and its integral is 1. Every standard integrable function that is non-negative and has integral 1 is a probability density by definition of a probability density.

Now, suppose instead that $p(x)$ is a nonstandard function, that is, it takes on nonstandard values in such a way that $p(x) dx$ is appreciable for some $x$ values, but suppose this only occurs for values infinitesimally close to some standard values $x_j$ (ie. the "delta functions" are at standard locations). Then there is no standardization of this function. However, we can define a probability measure on $X$ such that if $s$ is a standard set of points in $X$

It's trivial to show that this $\mu$ is a countably additive probability measure, first off we're adding up non-negative values, and the sum of all the nonstandard values equals 1 so the outcome is always in $[0,1]$. If s1 and s2 are disjoint standard sets then $\mu(s1) + \mu(s2) = \mu(s1\cup s2)$ by the property of the sum operation that defines $\mu$ in terms of $p(s)$. This is true for all unions of $K$ disjoint standard sets for $K$ any standard integer. This is true because the $K$ standard sets are disjoint, and therefore they partition the sum, we don't double-count any of the nonstandard grid points for example. By Transfer this is true for all $K$ including nonstandard $K$. To see that this implies countable additivity we use proof by induction. In standard mathematics, we have a sequence of subsets $\{s_i\}$ whose full union is the full set $S$, our equality is true for every $K$ prefix of the sequence of subsets by our partitions-the-nonstandard-sum argument, and hence by induction is true for the whole sequence of subsets.

So, finite additivity may be problematic as a foundation for Bayesian statistics on continuous parameter spaces, but nonstandard-finite additivity is sufficient to give measure theory, and to boot, in the nonstandard case, every standard measure has a nonstandard density.

Now, suppose we're dealing with infinite dimensional spaces, like the sample paths of gaussian processes. There may not be any standard Lebesgue measure on infinite dimensions, but there is a perfectly fine Lebesgue measure on every finite dimensional space of standard positive integer dimension $D$, therefore there is a Lebesgue measure on every nonstandard positive integer dimension $D$ as well by Transfer. Let $(x,f(x))$ be the graph of a sample path on a nonstandard grid of nonstandard dimension $D$ in x and in each f. Let $f$ have a nonstandard density defined by the nonstandardly-discretized multivariate normal distribution. The the standardization of some realization $f(x)$ is a gaussian process sample path (because every finite standard set of x values has $f(x)$ values with multivariate gaussian distribution, which is the definition of a gaussian process).

The problem with standard measure theory is that when you build your formalism on taking limits, things change from one type of thing to another under certain limits. For example, the normal(0,s) density for s an infinitesimal is a perfectly fine nonstandard density function, but in standard mathematics it is a "delta function measure" which is to say a measure over sets, not a function $f(x)$.