I was reading and commenting on the Entsophy Blog today and I thought up a little question about the Gaussian distribution in relation to his theory of what a probability distribution means.

Suppose we have some measurement device, and we know it has a resolution. Say a wooden ruler whose smallest mark is 1mm. If we read 32 mm as the length of something then we know that the real length is more or less $32 \pm \delta$ mm. We also have an intuition about $\delta$ that it should be around say 1.

We could model the real length as uniformly distributed between 31 and 33 mm, or maybe between 30 and 34 mm or maybe... We don't really want to completely exclude any given distance. Since we don't want to exclude any distance, we maybe choose a gaussian distribution with $\sigma = 1$.

Let's interpret the unit gaussian distribution as a mixture of different uniform distributions centered on 0. So we have a half width $w$ and the height of a uniform distribution on $[-w,w]$ is $1/(2w)$. Then if we're using a gaussian to discuss the idea that we'd like to put a fixed bound on how far from zero something can be, but we just don't know what exactly that fixed bound should be, then we're talking about a mixture of uniforms. At any given value $x$ the density of the mixture should be $\exp(-x^2/2)/\sqrt{2\pi}$. Since uniforms only contribute if their half width is at least x, we have (let's assume x > 0)

So we want to find out what this means about $a(w)$ the mixture distribution. Playing around in maxima results in the fact that $a(w) \propto w^2 e^{-w^2/2}$, which when normalized happens to be a chi-distribution with 3 degrees of freedom. A Chi distributed random variable with 3 degrees of freedom is always positive, and has high probability region in the range of around $[0.5,2.5]$ or so. In other words, we don't know exactly how wide our finite uniform interval should be, but if we let it be an unknown positive (chi-distributed) quantity which won't often be much bigger than 2 or 3 we can recover the idea of the normal distribution as a model for measurement uncertainty. In this sense, the normal distribution has a kind of "bounded and O(1)" quality to it even though it's not actually bounded.

6 Responses leave one →
August 15, 2013

I think that x should be a w in a(w).

That's a really interesting motivation for the chi-squared distribution.

2. August 15, 2013

typo fixed, thanks. It's actually the "chi" distribution, not the "chi-squared" but yes, it's a really interesting idea that the gaussian is basically the result of assuming that an error comes uniformly from some range, with the width of the range being an uncertain hyperparameter with a chi distribution as a prior.

If you choose a t distribution for example, you're going to find that a(w) is a different distribution, and most likely it's something that decays polynomially not exponentially. Also I suspect it will have a mode that is either 0 or close to zero, in other words, rather than being sure we could use something in a narrow range of widths but not either zero or large width, we think the width should be small but we can't decide very well, it could be quite large as well...

August 29, 2013

I was playing around with this too. I found a nice general decomposition:

Suppose f(x), the pdf for a real-valued X,, is unimodal and symmetric around 0. If W is positive-valued with pdf g(x) = -x f(x) and U ~ Unif(-W, W), then U's marginal distribution is the same as X.

Using this decomposition we can see how the t distribution behaves, the mixture distribution is $g(x) \propto \frac{(n+1)}{n}x^2((\frac{x}{n})^2+1)^{-(\frac{n+1}{2}+1)}$ where $n$ is the degrees of freedom. Asymptotically for large $x$ this decays like $x^{-(n+1)}$ as opposed to the gaussian which has the chi distribution as the mixing distribution, and decays like $\exp(-x^2/2)$.