Suppose you are in a Kindergarten class, and you have a bag containing 100 mixed nuts, Brazil, Walnut, Almond, Hazelnut, etc.

Now one of the children reaches in, pulls out a nut, and tells you what kind it is. How much does it weigh?

Let's make the assumption that the nuts remaining are not radically different from the one chosen, at least for that type.

Consider the bag of nuts as a prior over the weight of the nuts. It's the range of weights that we consider reasonable. Our data is the kind of nut...

Repeatedly reach into the bag of nuts, and pull out one nut. If the nut is the wrong type, then you know it's not like the data, so throw it back. This is the likelihood, it down-weights parameter values that don't fit the data. If it's the right type, weigh it (this is a sample of the parameter, the unobserved weight) and write down the weight. Once you've done this so that you have measured your type of nut many times, then build a histogram of the weights you wrote down.

The nut your child is holding probably has a weight that is near the peak in your histogram, and the weight probably isn't outside the range of 95% of the weights in your histogram. This is the Bayesian way of thinking in a nutshell.

5 Responses leave one →
1. August 27, 2015

Note, the unfortunate problem is that this sampling from the bag analogy does confuse probability and frequency. The sack of nuts represents a prior over nut weights by using a sample to represent probability through frequency.

Consider making a histogram of all the nut weights, and a histogram of the nuts weights you get after downplaying the nuts that don't fit the data (the ones that are the wrong type). See how the probability distribution as represented from the sample changes before the data vs after.

2. August 28, 2015

Consider this, the source of variability in the histogram is NOT representative of the variability in the actual weight of the one nut being held by the child. The one nut that the child has when put on the scale will give the same exact weight every time +- the sensitivity of the scale, so let's say 1%. The variability in the "posterior" histogram (the histogram after you've thrown away the values that are inconsistent with the knowledge of the type of nut) reflects the fact that all we know is that the one nut is from the general range of weights represented by the other nuts in the bag.

If we had other data, we could get more specific. For example, if we had the child measure the diameter of the nut, we could throw away nuts that are of a different type, or the same type but have a diameter that is too far away from the measurement of the one nut. Such a histogram will be more narrowly peaked and will give us more information about the one nut.

3. August 28, 2015

Also note, if you actually want to run this demonstration in a class, I recommend buying various sets of bulk dice off Amazon, and/or super-balls etc. Nuts are an allergy issue for some people, and this is especially an issue in elementary school classes.

I am actually thinking of doing something like that at my child's elementary school. Any suggestions for ways to improve the demonstration, especially a way to get an accept-reject rule based on a continuous measurement would be welcome. For example, maybe cutting soda straws to random lengths, and then weighing them and rejecting according to:

(data-sample_len) ~ normal(0,5) in mm or something. It would be nice to have a way to use dice to generate a random number to decide whether to accept/reject. We could use a high quality printout of a normal distribution curve, and a percentage die (10 sided) to do accept/reject (normalize the curve so that the peak is at 100, accept if your random number falls below the percentage for the given difference).

4. August 28, 2015

I'm liking the soda straw idea, and the next step in the soda straw would be to do bayesian inference on the weight per length.

Define a uniform prior on weight per length. Select a slppe value, draw the line with that slope using a pencil on a graph of the straw data. Accept the line if the average absolute error is in the high probability region of an appropriate gamma distribution. Do acceptance by inking in the line. This makes the calculations easy enough if you have a printout of the graph of the gamma. To choose the gamma, consider that you have a positive error at each measurement, that you know the order of the magnitude of the error, and that you know that you have N straw length, weight pairs. use a gamma(N,1/err_mag) distribution, normalized so the peak is at 1.

To make the data have enough noise, you might need to have a little pile of straw scraps and you can at each measurement remove some scraps and add new scraps, keeping the number of scraps constant, so there's a positive and smallish error at each measurement...