I have an intuitive argument that says that every function of many finite-variance random variables can be expressed to good approximation as a different function of O(6) random variables, where O(6) means "on the order of" or "a small multiple of" 6.

The argument is related to a recent article in American Scientist about high dimensional geometry.

According to Wikipedia, the formulas for the (hyper) volume V of an N sphere and the (hyper) surface area S of an N sphere are:

$S_n(R) = \frac{2 \pi^{(n+1)/2}}{\Gamma ((n+1)/2)} R^n$ and

$V_n(R) = \frac{\pi^{n/2}}{\Gamma(n/2+1)} R^n$

Notice that both quantities go to zero quickly with large n because they are divided by the gamma function of n. So not only does the volume of a unit sphere go to zero as a fraction of the containing cube, but also it just goes to zero period.

Now to random variables. Imagine you have n random variables each independent and with mean zero and finite variance which we normalize to 1. You are using these random variables to model the uncertainty in some system so you have a function $F(\vec \xi)$ which depends on these random variables (the function F also absorbs the scale and shift parameters which is why we can assume mean 0 and variance 1 for our random variables). We assume independence because we want the minimum set of random variables needed to model the randomness in our process, and it's easy to create further dependent random variables from the independent ones by combinations.

The squared radius of the vector $r^2(\vec \xi) = \sum_i \xi_i^2$ is on average $N$ because each $\xi_i^2$ is on average 1 independently. The variance of $r^2(\vec \xi)$ is the variance of a sum of N independent random variables $< N\, {\rm max}({\rm var}(x_i^2))$ where the variance of $x_i^2$ is related to the 4th moment of the $x_i$.  Therefore the standard deviation is $O(\sqrt{N})$

Now with $\sqrt{N}$ deviation around a radius $N$, the sample points of this process will all lie in a small band essentially on the surface area of an N-1 sphere. By reparameterizing our vector into a radius r and N-1 other random variables with some dependence we can treat r as if it were a constant and ignore its randomness to a good approximation, reducing our problem to an N-1 dimensional one and introducing some dependence among the remaining variables. However if N is large the dependence is small, and we can repeat the process to get N-2 variables. We can continue to repeat the process until our assumption that the dependence is small and the variance in the radius is small (as a fraction of the radius) is violated at a scale that we care about.

Now back to the hyper-geometry problem. The formula for the surface area of a sphere, and for the volume of a sphere each has a maximum in the vicinity of 5 to 6 dimensions. This suggests strongly the existence of a distinguished scale for the number of variables where our dimension reduction process can no longer take place, after all we are using spherical geometry to argue for our dimension reduction, and when we get near this critical dimension we can no longer argue that the variance in the radial direction doesn't matter, the volume of the n-1 dimensional sphere is on the same order as the volume of the confidence band radially around the n sphere.

So if we're going to model a random process we need some small multiple of say 6 dimensions no matter how many random dimensions the problem really has (ie. $10^{23}$ molecules in statistical mechanics). So let's say that this constant is on the order of about 10, which means that 64 independent random variables ought to be enough for anyone (with apologies to Bill Gates who never actually said that 640k RAM would be enough for anyone).

But seriously, this is not a rigorous argument, but it is certainly suggestive that we will often be successful using something like 12 to 24 independent random variables as the "sources of uncertainty" in our model, even if our problem has in fact many many random variables.

2 Responses leave one →
1. November 7, 2011

I need to re-read this more carefully but I like your intuition, especially after I just read Brian Hayes' recent post on hyperspheres:
http://bit-player.org/2011/the-n-ball-game
Is it a coincidence that both of your posts came out at a similar time?

2. November 7, 2011

I've been thinking about this hypersphere issue for a while since I read a paper on the use of Concentration of Measure inequalities in reliability engineering. But the impetus to write a blog post on it was in fact reading Brian Hayes' article in American Scientist which I get in paper form. I wasn't aware he also had a blog, which I have now added to my RSS feeds. Thanks!

Also note that my early versions had some sloppy deductions about the variance of the radius, but this version is sound.