The probability density vs a sequence of random numbers

2016 May 6
by Daniel Lakeland

How can we map back and forth between (nonstandard) density functions, and random number sequences?

Let x_i be a finite sequence of numbers of length L with L a nonstandard integer. Let -N,-N+dx,..-N+i dx,..N be a grid of values on the real line with N a nonstandard integer and dx = 1/N so that there are 2N^2+1 grid point values.

Let L/N^2 be nonstandard (L is ULTRA ULTRA ULTRA BIG, so that there are roughly "a nonstandard number of values for each grid point").

Consider a "randomly chosen" permutation of the integers from 1..L where we intuitively understand somehow how this is done (ie. we pick one of the permutations intuitively). Call this sequence c_i (for choice) then imagine choosing at random from the sequence x_i by choosing the integers in order i=1..L calculating c_i and choosing the appropriate x_{c_i}. Assert without proof that x_{c_i} is an IID random number generator for some distribution when c is chosen "well enough". What distribution?

For each value, determine which grid point it falls in, and increment a counter for that grid point until we get to the end of the sequence. Call this counter n(x). Construct the nonstandard function n(x)/L this is the nonstandard probability density associated with x_i. When the standardization of this function exists, use it. Otherwise, perhaps you have something like a delta-function, where all the values are infinitesimally close to a single value, or some such thing. Those are useful too so we don't necessarily require a standard density. The fact that we include nonstandard densities means we are able to actually represent arbitrary probability measures, where the measure of an open set S is \int_S n(x)/L dx

To reverse this process, start with the function n(x)/L, multiply the function by L then for each grid point append a string of n values equal to x on our sequence. Once we have the full sequence x_i we can "sample from it" by choosing a permutation of 1..L.

Now, let's canonicalize our sequences, namely if we have a sequence x_i we can sort it in ascending order. Now for every function n(x)/L we can find a canonical x_i of length L.

How many different RNG sequences of length L are there for a given density function? There are one for each permutation, or (L!). How big is that? Remember Stirling's approximation: \log(N!) = N \log(N) - N which is already quite accurate at N = 3. So there are about \exp(L \log(L) - L) different RNGs corresponding to the same distribution. We could perhaps require only permutations that leave no element in place, called the derangements, there are about L!/e of those so that'd put us at \exp(L\log(L)-L-1)

Are all of these really "random number generators" that will satisfy Per Martin-Lof? Probably not. But a big fraction of them will be. Imagine that an oracle tells us Martin-Lof's "most powerful test" for a binary random number generator. Take the subset of the permutations which cause the sequence x_{c_i} to pass that test at the \alpha level, when x_i is the sequence {0,0,\ldots,0,1,1\ldots,1} where the first half of the sequence is 0 and the second half 1. Constrain our choice of c_i to be one of these sequences, it's still going to be a big big number of sequences (asserted without proof).

So, it's always possible to take a probability measure (a nonstandard density) and convert it into many different sequences of IID samples from that probability measure (a random number generator) and back and forth.

Mathematically, this means in essence that "you can sample from any probability distribution". But this doesn't mean that placing a probability distribution over real world data D implies that you think D is a sample from that distribution. As a Bayesian, you are free to reject that. The probability involved is in your head or at least assumed for the purpose of calculation (ie. in the head of the Jaynesian Robot, or of MC Stan).  As Jaynes says: "It is therefore highly illogical to speak of 'verifying' (3.8 [the Bernoulli urn equation]) by performing experiments with the urn; that would be like trying to verify a boy's love for his dog by performing experiments on the dog." - E.T. Jaynes, Probability Theory


3 Responses leave one →
  1. Kit Joyce permalink
    July 10, 2016

    Hi Daniel, I have a question very tangentially related to this post so I thought here's as good a place as any to post it. I'm essentially trying to understand how generation of random numbers from a given distribution interacts with machine arithmetic. My thoughts on it all are a little half-baked so I'll try not to say anything too silly, but will probably fail 🙂

    Consider X ~ U(0,1), i.e. a real-valued RV uniformly distributed between 0 and 1. Reasonably obviously X is irrational almost surely. But if I ask for X* <- runif(1) in R, I'll get a machine number, i.e. from a particular subset of the rationals. At the moment I think of this as representing the interval of real numbers for which this is the closest machine number, that my 'actual' X is somewhere in that set (which has positive measure) and my (relative) error in using X* instead of X is both small and can be handled using the part of the theory of numerical analysis which is designed for exactly this. So what's the problem, you ask? Well, first of all, is this how other people think of it, and have they written about it?

    Secondly, pseudorandom number generators I'm familiar with produce integers, for simplicity say in the range 0 to 2^n -1, uniformly (with the uniformity being one of the things we test). So if we then want a double-precision float between 0 and 1, we need to make sure that bits we put in do represent a number in this range, and moreover, if my intuition about intervals above is correct, we no longer want to select uniformly as smaller numbers will also have smaller intervals around them, e.g. 1 should represent the interval (1-eps/2,1) (I think...), whereas say, a denormalised number will represent a much narrower interval. So we have to 'invert' the widths of the intervals or something... And now, thinking about it, I'm wondering if the 'redundancy' of numbers that can be represented at lower precision helps cancel this out... or do we need to do another modification to ensure we're putting in the maximum precision version of that number... wow I'm really confused, sorry!

    I actually don't really need an answer necessarily, I just can't even find any references that discuss something like this. Google has absolutely failed me :). I certainly haven't found it in literature on PRNGs, or machine arithmetic.

    Even more half-baked-ly, given that we are working in finite precision, all our distributions are discrete, so, do we actually need/can we actually use measure theory? Could there be a useful theory of probability only on finite subsets of R that look like sets of machine numbers and specifically intended to give us computational tools? Or is it actually a really good idea to still think of continuous objects and deal with them like we do in numerical analysis?

    Alright, brain dump complete; if you have any thoughts or references I'd be very appreciative!

    • Daniel Lakeland
      July 10, 2016

      Hi Kit. thanks for asking this interesting question. Let me think on it and I will post something in the next few days, possibly as a new blog post on this topic.

      • Kit Joyce permalink
        July 11, 2016

        Thanks Daniel, looking forward to it.

        I've found some more of your previous posts on NSA like this which are really helpful, though I'm still interested in the 'interval inversion' thing I mentioned above. I'm finding some stuff on constructive NSA which might be relevant too. Ahh, so much to learn...

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS