# The importance of Probability vs Frequency in Quantum Mechanics

There’s been some discussion recently on Andrew Gelman’s blog about Quantum Mechanics and probability… For example here. In that thread I raise the following thought experiment.

We set up a classic “two slit” apparatus. A laser fires single photons towards an intermediate screen with two slits in it, and then on towards a white screen where the photon position can be recorded (say by a fancy CCD).

One of the slits in the intermediate screen has a little shutter which can be open or closed and which is fed by a source of quantum noise. Like for example every time a geiger counter detects a radioactive decay the shutter flip-flops, or it goes back and forth based on some radio noise from the atmosphere or something, but in the long run the fraction of the time that it’s open = 1/2.

Once the far detector receives the photon and records its position, the apparatus beeps. Finally, a photograph of the position of the shutter is also taken at the time the photon is fired, so we can determine whether it was open or closed, but only by reviewing the record.

Now, let’s talk about Probabilities, denoted P, taken to mean Bayesian plausibility measures over facts about the world… and Frequencies denoted F counting how often a given thing happened in an ensemble of those things. Let’s assume that in addition to whatever I condition on below, we also add | Setup, that is, assuming our knowledge of the experimental setup as described above.

- Write down the probability p( Flash at X | Beep)
- Note that all we have is our knowledge of the setup, and the fact that a photon was received at some point on the detector. We would use our QM knowledge to calculate Psi(x)^2 for the two cases, one with the shutter open and one with the shutter closed, and create a 50/50 mixture model of the two.

- Write down the probability p(Photon went through the first slit | Beep)
- This is intended to be a trick question. It stabs right at the heart of QM interpretation. As far as I can tell, there are *some* interpretations of QM in which a photon has a well defined position at all times (nonlocal hidden variable theories such as Bohm’s) and *some* interpretations in which the photon doesn’t exist until it comes into being by colliding with the final detector (this is generally how the Copenhagen interpretation looks, though it doesn’t seem to me to be a well defined single interpretation, but for example this is how Griffith describes the interpretation in the intro to his standard textbook ~ pg 6). And maybe some other interpretations, like the Many Worlds one where the photon goes through both slits, it’s just a question of which world we happen to be in.
- Nevertheless, if we take a Bohmian type interpretation, then based on only the Beep, we can say there is a 50% chance the shutter was closed, so it must have gone through the first slit, and a 50% chance the shutter was open, and if the shutter was open things are more complicated… see below.

- Write down the Probability p(second slit was open | flash at X, Beep) (in this case the Beep just tells us that the photon fired… so we don’t have to include the option “no photon has landed yet”, we’ll drop the Beep)
- We can write down p(flash at x | second slit open) p(second slit open) = p(second slit open | flash at X)p(flash at X)
- p(flash at X) we use our knowledge of the apparatus to induce our only way of assigning probability, which is to calculate psi^2 for each situation, and mix them: psi_open^2 * 0.5 + psi_closed^2 * 0.5, and p(second slit open) is just 0.5, also p(flash at X | second slit open) is psi_open^2, so we have:
- psi_open^2 * 0.5 / (psi_open^2 * 0.5 + psi_closed^2 * 0.5)

- Now calculate p(Second slit was open | flash at X, photo of second slit, Beep)…
- Trick question, photo of second slit tells us all we need to know about whether the second slit was open or closed. This is either 1 or it’s 0.

- Lets start quantifying our knowledge of where the photon went under additional information… write down p(photon went through the first slit | flash at X, photo of second slit, Beep)
- You may see where this is going. If we know from a photo that the second slit was closed, then the photon to the extent that we allow it to have a trajectory, must have gone through the first slit.
- On the other hand, if we show that the shutter was open, then the photon either went through the first slit or the second slit, but we don’t know which. If we go along with Bohm, information about where it struck the detector should inform us somewhat about which slit it went through… So we calculate the wave function, and the strange trajectory of the particle. We run an Approximate Bayesian Computation type calculation. We select a photon initial position at the aperture of the laser according to our best guess of the distribution of photons at the aperture (let’s say uniform across the aperture), we propagate that photon through space according to Bohm’s equation, and we observe where it hit on the final screen. We do this in a tremendously large number of trials, taking only those photons that actually strike the screen within the range x +- epsilon where epsilon is the width of the CCD pixel or whatever. Then we calculate which fraction of these photons went through the first slit. This is p(photon went through the first slit | flash at X, both slits open).

Now, let’s examine instead the frequencies:

- F(flash at X | beep) = either 1 or 0, you have to ask the CCD if the flash occurred at X and find out. At the moment, at best we can put a Bayesian probability on this F. The Bayesian probability could be calculated from calculations above!
- F(flash at X | CCD) = {1,0} one or the other, our Bayesian probability of the frequency being one or the other collapses down to either 1, or 0. Is this “wave collapse?” no, it’s conditioning on information.
- Write down F(second slit was open) = {0, 1} either 0 or 1 depending on what actually happened. However we can put a Bayesian prior of 1/2, 1/2 on each because of how we arranged the flip-flop shutter.
- Write down F(second slit was open | record of the shutter) = a single number either 0 or 1 just look at the record of the shutter position and find out. Again, not wave collapse but it was caused by either geiger counter detected or didn’t.
- Write down F(second slit was open | flash at X) = {0, 1} depending on what X is… If X is the actual value of the X where the flash occurred, then = 1 otherwise = 0.

Clearly, we drive a strong wedge here between the interpretation of probability (meaning plausibility of what happened given information that we have) and frequency in repetitions. Furthermore we make a strong argument for the utility of a Bohmian viewpoint, because *it lets us calculate the probability that a quantum particle went through a particular region of space on its way to interacting at a detector*. Classically speaking, a Copenhagen interpretation says “the particle doesn’t exist, or the question of where the particle is is not meaningful until it is detected”. For Bohm, this is bunk. Conditional on knowing where the particle landed, we have a straightforward way to back out which paths are more or less likely…

Is that a desirable property of a theory? That it gives us probabilities for intermediate outcomes? It is to me. Is it desirable enough to put up with the nonlocality of Bohm’s equation? I actually think the nonlocality of his equation is pretty nifty, I’m not sure what the heck is wrong with physicists that they tend to reject that outright. It seems like a lot of them are wishy washy on this topic. I *can* understand why physicists would not want classical information traveling faster than light. But it doesn’t seem Bohm’s theory allows this anyway, so it’s not a real objection.

That Gelman blog comment is just the worst. Andrew has some real blindspots and this is definitly one of them (although it doens’t come up as much as his others). The double slit experiment shows that:

Freq(x|both slits open) != Freq(x|left only open)*alpha+Freq(x|right only open)*(1-alpha)

for any value of alpha. As such it doesn’t say the basic equations of probability theory are wrong.

From Andrew:

“What statisticians call “probability theory” is what physicists call “Boltzmann statistics” or “hidden-variable models.”

That’s a bizarre claim. I assume he meant that in statistical mech and some hidden variable theories people use regular old probability theory.

“These models are not in general true, in the sense that they do not apply in quantum mechanics. In mathematics we say that a conjecture is false if there are any counterexamples to it. In that sense, probability theory is false.”

Models make claims about the real world. Probability theory is just a slick way of keeping track of counts (hence the deep connections been probability and combinatorics).

” There are some settings such as coin flipping and die rolling where probability theory is evidently true”

I just can’t fathom what Andrew means by this other than “At some point someone made a probability calculation using coins or dice that roughly turned out right”. The reason of course this sometimes happens with coins and dice is because the specific thing (frequiencies of occurance) they’re trying to predict are highly insensitive to the exact physical details. “probability theory” served merely to do those counts well enough to realize that.

“and other settings such as Bose-Einstein statistics, Fermi-Dirac statistics, and the two-slit experiment where probability theory, as it would be intuitively applied, is false”

Bose-Einstein and Fermi-Dirac statistics use regular probability theory as Andrew defines it!!!! Here Andrew really shows his ignorance! The only place QM enters into those statistics is the occupation numbers (0,1 for F-D and 0,1,…, infinity for B-E), the rest of the calculation is purely standard probability theory.

In fact, there’s a fairly well known family of Maxent distributions used in Inventory Management will allows for each item in inventory to have 0,1,….,N values. This is as purely classical as classical gets, and yet when N=1 the distributions become F-D statistics and when N=infinity you get B-E statistics.

Andrew gets himself tied up in knots like that because in truth he’s a Frequentist at a philosophical level. He started out as a Frequentist and never really got over it. A philosophical Frequentist heavily mixes in ontological claims about the real world into their foundations of probability theory. That’s why he things equations that are essentially equivalent to “counting” are “right” or “wrong”, rather than the more reasonable “counting is counting, models are the things that can be right or wrong”

“counting is counting, models are the things that can be right or wrong”

Can I get this on a T-Shirt?

Hah! Nice. When I taught ecology grad students how to build up to Bayesian state-space models using Stan, I made sure to re-derive probability from combinatorics just to drive this point home. Most grad students come in with the temporal frequentist definition of probability, and it gets in the way of appreciating the simplicity and power of conditional probability modeling.

I’m a huge fan of Andrew Gelman. I have to say that I was most confused by his reveal of a frequentist interpretation of the prior a while back. It seemed quite muddled to me, but I don’t claim to have understood the argument well enough to say much about it.