Back of the Envelope Cost-Benefit on pulling your kids from school

2020 March 4
by Daniel Lakeland

It is clear that COVID is spreading in communities in Northern California, and Washington. The time until it is confirmed to be spreading in SoCal is probably a few days. It will always be confirmed *after the fact*, which means it is probably spreading in the SoCal community at the moment, though in the early stages. Outside China cases are increasing exponentially with a doubling time of about 5 days +- which you can read off the graph at several web-sites such as the linked map site (click logarithmic graph on the lower right graph, read off the yellow dots for outside China spread).

I personally view it as inevitable that PUSD will decide to close schools. I don’t know what their timeline will be, but as these are typically committee decisions and there is risk either way (too early vs too late) I expect them to be delayed until the choice becomes obvious. On a doubling every 5 days trajectory, that probably means somewhere in the 10 to 15 to 20 days from now (which would mean somewhere around 800 to 3000 cases in the US). Spring break being Mar 30, I could imagine they’ll try to stay open til the 25th or so, and then not reopen after spring break. Though more pro-active decision making might mean closure in the next 5-10 days or so now that Pasadena has declared state of emergency. All this is more or less my own opinion based on reading the growth charts, and seeing the responses from large organizations canceling conferences and things.

Now, at what point is it actually logical to pull your kids from school? I’m going to do this just for a family with a stay at home parent, because the calculation for lost days of work is much harder and depends on a lot of factors. We can back of the envelope calculate this as follows: Costs of lost days of education is on the order of a couple hundred dollars a day. Let’s say $20/hr x 6hr/day = $120/day. If the stay at home parent can provide some of this education, the cost might drop to say $50/day…

Now, what’s the costs associated with sickness? Let’s just do the calculation of one parent gets seriously ill and dies. For a child in elementary school let’s just put this around say $10M.

Now, what’s the chance of death if you have definite exposure? It’ll be something like 100% chance of getting sick and 0.5% chance of death (assuming parent doesn’t have underlying conditions and isn’t unusually old)… So the expected cost is $10M * 0.005 = 50000… So by this logic, you should be willing to avoid that by pulling your kids from school about 1000 days early. Of course, it’s way too late to be 1000 days early, so basically you should pull your kids from school TODAY.

Now, suppose you have a job making $100k/yr, and you just get cut off from that job. That’s $385/day (which you don’t take home all of, but whatever). So if you add $50/day to that for educational loss, you should be willing to pull your kids about 115 days early. It’s also too late for that… So again, pull your kids TODAY.

Any way I back of the envelope this, it’s time to pull your kids from school… I don’t see a big enough flaw in all these calculations that would lead to waiting another 20 days.

Bayesian Decision Theory and Coronavirus

2020 March 4
by Daniel Lakeland

You’d have to seriously be living under rock to not know about Coronavirus… But not matter how much you know about it at the moment, you probably don’t really know what we should do about it as a society. I mean, what are the various factors involved, should we close schools, churches, sporting events… what to do at nursing homes? Who should go to work and who should stay home? How would they afford it?

This is because all those questions are actually answerable to some extent (probabilistically at least) but there isn’t a group tasked with doing the analysis. It would be a good idea. Like, what the heck is the WHO doing if not at least staffing say 10 people who develop disease modeling software, and have several racks of computers to run MonteCarlo scenarios?

Well, whatever, if they were going to hire some people to do this stuff, what does the analysis look like? Here’s the general idea:

  1. Describe the factors that are associated with costs…
    1. Loss of Quality Adjusted Life Years (QALYs). This is the cost associate directly with “you don’t feel well for N days” all the way up to early death… The direct real-world cost of loss of healthy time.
    2. Loss of productivity: people who are sick don’t provide services to other people, they don’t produce goods, etc.
    3. Cost of treatment: people who are sick require other people to take care of them. They require medicines. Etc etc.
  2. Describe the factors associate with reduction of cost, or creation of benefits (or increasing costs above what they otherwise might be):
    1. Treatment of a person may shorten their sickness time.
    2. Treatment of a person may avoid them spreading the disease.
    3. Quarantine or Social Distancing may reduce spreading rate.
    4. Fast spreading rate may result in overwhelming local medical care, resulting in lack of care and much worse symptoms even death.

Once we put all these different factors into a model of the costs of any given scenario, we have the structure for a decision, but we still don’t know what the right values are for the parameters. For example, what’s the right cost of loss of worker time in India, how about in Vietnam… in Canada? How about the cost of health care, or the number of hospital beds etc? One needs to collect data, and estimate quantities. Some quantities will need to be estimated during the outbreak, like the growth rate of the number of cases in each country and the effect on this growth rate of different kinds of responses… Some numbers we will never know particularly accurately, but we will need to “borrow strength” from estimates across nearby regions, or similar cultures.

So, after specifying all that… we need to run a tremendous number of simulations, using the posterior distribution of the estimated quantities, predict the costs of different responses. From this we will get a variety of distributions over costs for different scenarios, and can calculate what seems to be the best response choice. If we make that choice, we continue to collect data and figure out what is going on, going forward, and continue to estimate what is the best choice… possibly changing the response through time as things become clearer whether they work. There’s some reason to think that that we should try different responses in different places, so as to collect information about what might work, and then switch people to the apparently most effective thing as time goes on.

More math of infectious diseases

2020 March 2
by Daniel Lakeland

Thanks to Coronavirus, let’s talk about the mathematics of infectious diseases… This is the mathematics of exponential growth. It works like this… One person has a virus in a transmissible state without severe symptoms. They walk around throughout their day contacting people. Some fraction of the people they contact contract the virus. Several days from now, those people are in the transmissible state. They go about their day, spreading the virus… Etc The number of new cases tomorrow is some small multiple of the number of people who got the virus in the last 4 or 5 days (once you’re sick you don’t spread the virus further necessarily, once you recover or die you don’t spread the virus at all). Of course, there are some people who never get severely ill… and they may continue to spread the virus for tens of days.
The result is that if you plot N vs time it is an exponential curve, or log(N) vs time is a line… which is exactly what you see when you click the “Logarithmic” graph of cases vs time in the Johns Hopkins data display. (actually it’s more complicated, you first see an alarmingly fast line, then it turns over and continues at a different slope, this probably reflects our response, which is delaying and slowing the spread). Eventually enough of the population is sick that there aren’t new people to infect, or everyone becomes extremely wary of being around sick people, and the curve stops growing exponentially. So the result is only exponential in the initial stages (described by mathematicians as “asymptotically for small times”).

Now, exponential growth is *really fast*. Most people can’t really “get” it. Think of it this way, if you start with a ruler that’s 1 foot long, and you double it every day. It takes you 30 days to get the moon, 39 days to get to the sun, and 57 days to get to Alpha Centauri. Since it’s 4 light years to Alpha Centauri, evidently somewhere in the first 40 days or so you substantially exceeded the speed of light extending your ruler…
Next let’s talk about bias in measurement. In general there are two ways to get a biased estimate of how many people have a disease. One is that a bunch of people get the disease, it’s not very bad, and they don’t get tested or counted. This biases you LOW. Obviously the very sick do get tested, and so they are the only ones counted. If there is variability in symptoms, and there always is, you are essentially *always* biased low looking at the “confirmed cases”. The other bias is when people stop testing for the disease and report that everyone with some broad set of symptoms probably had the disease. This biases you higher… It’s primarily an issue with rare diseases rather than epidemics.

In general how big this bias is is unknown for any given disease, but for coronavirus which is known to only cause “a bad cold” in many people, and often very mild symptoms in the young, it’ll be significantly biased low.

As of this morning (Mar 2 2020), there are 86 confirmed cases reported on Johns Hopkins data display, and 2 deaths mostly involving people with existing health complications. The largest outbreak in CA is in Santa Clara County (Silicon Valley) with 6 confirmed cases. How many actual cases are there in the US? We don’t know, but given the known bias that people with mild symptoms will never be tested, and the fact that there are circulating cases (people who got it from an unknown source) it would be silly to estimate less than say 2x the number of confirmed cases, and it would be reasonable to estimate perhaps up to 10x. So that means somewhere between around maybe 160 and 860 real cases in the US today. If you look at the “other locations” graph, you’ll see that in the month of February the reported cases doubled about 6 times, or doubling every 5 days or so. If there are say 200 actual cases in the US, how many will there be by saturday when the Southern California Linux Expo is supposed to take place? The answer is perhaps 400. How many reported cases would there be? Perhaps 400 * 86/200 ~ 170. How many cases will there be 1 week from then? Around 800, how many by Mar 21… about 1600. With reported cases around 700.

The good news is this reflects a substantially reduced growth rate compared to the early days in China, when cases went from 200 to 10,000 in about 10 days (Jan 20 to Jan 30). If we had that kind of spread rate here in the US, we could expect 10000 cases *reported* by Mar 12. That’s a lot faster than the numbers above, and even more scary. In general it’s good to slow the spread, because slowing the spread prevents the health care system from being overwhelmed and unable to care for people. That leads to much higher death rates than would occur at a slower rate of spread.

So, what can we do about all this? The number one thing for a virus that sometimes has mild symptoms but occasionally very bad, is to start *early* infection control procedures. Social distancing is the term used for things like closing schools, working from home, canceling conferences, canceling sporting events, etc. When should we start social distancing? The answer is basically right now or very soon at least. On the order of 7 to 10 days from now. Remember, exponential growth? In china on Jan 20 the epidemic would have seemed not that big a deal, 278 people in China were reported having the disease. By Jan 30 it was 10,000! Anything we can do to slow the spread of the virus thereby reducing the number of cases at any given time that need severe treatment will save lives.

I won’t be going to the SCALE linux conference, even though it’s right here in Pasadena, and even though my kids get free entrance through their schools. It simply doesn’t make sense for our family, as the value we’d get isn’t high enough to overcome the general risk of being around maybe ten thousand people or more milling around sneezing. And they will be sneezing… it’s allergy season in LA.

More on QM Probability vs Frequency

2020 February 28
by Daniel Lakeland

So in my previous post, I mention the possibility to use Bohmian deterministic mechanics to discover the probability that a photon goes through a given slit if you observe it flash at a certain spot on your detector. In the setup there’s a shutter that opens and closes the second slit randomly according to radioactive decay or the like. Let’s see how we can use this information.

The frequency distribution for photons with both slits open is an “interference pattern” which has an oscillatory nature. For example from the Wikipedia article on the double slit experiment:

Single vs Double Slits
A single slit has no fast-oscillating pattern but still develops a diffraction-spreading pattern. The double slit pattern shows the fast oscillations with regions of “dropout”, dark regions even in the center of the pattern.

So, suppose we observe a photon in the general brightest central region. Suppose that it flashes within one of those “dark bands” that the double-slit pattern shows. Obviously psi_double^2 is very small in this region whereas psi_single^2 is large. Therefore the posterior probability that the particle went through the first slit because the second slit is closed… is very high. On the other hand, if we see the flash in one of the regions that is bright in both the diffraction and the interference pattern, then we have a harder time knowing whether the second slit was open or closed, though if the brightnesses are slightly different, then we infer that one vs the other was more probable.

What about the situation where we know the second slit is open, and we see a flash at a particular spot. Consider the 2 slit picture from above. If the flash comes from say far to the right where the diffraction pattern is quite dark, but the interference pattern has more light… Then when we run the Bohmian mechanics we will probably find that the photon came from one or the other slit with higher probability. Not having done the calculations I just don’t know, but let’s suppose for example it has a 80% chance of coming through the second slit. What does this matter? In particular, the pattern is the way it is because the apparatus is the way it is… in other words the second slit is open, there is nothing interfering with the passage of particles through that slit, there is no special magnetic fields or electron clouds or glass pieces or anything in the way, and so the physical scenario is such that the wave function does a particular thing, and voila… If we did something to perturb the apparatus, a different wave function would get set up, and a different path would be taken by particles initially coming from the same place as the original particle, and so it wouldn’t hit in the same place and might go through a different slit. What physical consequence can our knowledge that the particle had a high probability that it came through a given slit have? It’s a fact about the past, so we can’t act on it to change the past. It might matter if it would help us decide whether the particle might have interacted with some apparatus along the way, but if there were an apparatus along the way, the wave function would have been different and we’d have a different probability that the particle went along the path.

It seems to me this is one of the essential features of the problem of “whether a particle went through one or the other slit or both”. People whose interpretation of QM is that the particle doesn’t exist until it hits our detector are interpreting “there is no physically observable consequence of inferring the path that a particle took in the past” as evidence that “the path that the particle took in the past doesn’t exist”. This is rather odd. The fact is, by coupling our knowledge that the path might more likely have been X to knowledge of what things might have affected that path (such as the shutter) we can potentially infer that stuff we don’t know was more likely to be one way or another… For example, perhaps we can infer that there was unlikely to have been radioactive decay in the shutter mechanism.

The importance of Probability vs Frequency in Quantum Mechanics

2020 February 27
by Daniel Lakeland

There’s been some discussion recently on Andrew Gelman’s blog about Quantum Mechanics and probability… For example here. In that thread I raise the following thought experiment.

We set up a classic “two slit” apparatus. A laser fires single photons towards an intermediate screen with two slits in it, and then on towards a white screen where the photon position can be recorded (say by a fancy CCD).

One of the slits in the intermediate screen has a little shutter which can be open or closed and which is fed by a source of quantum noise. Like for example every time a geiger counter detects a radioactive decay the shutter flip-flops, or it goes back and forth based on some radio noise from the atmosphere or something, but in the long run the fraction of the time that it’s open = 1/2.

Once the far detector receives the photon and records its position, the apparatus beeps. Finally, a photograph of the position of the shutter is also taken at the time the photon is fired, so we can determine whether it was open or closed, but only by reviewing the record.

Now, let’s talk about Probabilities, denoted P, taken to mean Bayesian plausibility measures over facts about the world… and Frequencies denoted F counting how often a given thing happened in an ensemble of those things. Let’s assume that in addition to whatever I condition on below, we also add | Setup, that is, assuming our knowledge of the experimental setup as described above.

  1. Write down the probability p( Flash at X | Beep)
    1. Note that all we have is our knowledge of the setup, and the fact that a photon was received at some point on the detector. We would use our QM knowledge to calculate Psi(x)^2 for the two cases, one with the shutter open and one with the shutter closed, and create a 50/50 mixture model of the two.
  2. Write down the probability p(Photon went through the first slit | Beep)
    1. This is intended to be a trick question. It stabs right at the heart of QM interpretation. As far as I can tell, there are *some* interpretations of QM in which a photon has a well defined position at all times (nonlocal hidden variable theories such as Bohm’s) and *some* interpretations in which the photon doesn’t exist until it comes into being by colliding with the final detector (this is generally how the Copenhagen interpretation looks, though it doesn’t seem to me to be a well defined single interpretation, but for example this is how Griffith describes the interpretation in the intro to his standard textbook ~ pg 6). And maybe some other interpretations, like the Many Worlds one where the photon goes through both slits, it’s just a question of which world we happen to be in.
    2. Nevertheless, if we take a Bohmian type interpretation, then based on only the Beep, we can say there is a 50% chance the shutter was closed, so it must have gone through the first slit, and a 50% chance the shutter was open, and if the shutter was open things are more complicated… see below.
  3. Write down the Probability p(second slit was open | flash at X, Beep) (in this case the Beep just tells us that the photon fired… so we don’t have to include the option “no photon has landed yet”, we’ll drop the Beep)
    1. We can write down p(flash at x | second slit open) p(second slit open) = p(second slit open | flash at X)p(flash at X)
    2. p(flash at X) we use our knowledge of the apparatus to induce our only way of assigning probability, which is to calculate psi^2 for each situation, and mix them: psi_open^2 * 0.5 + psi_closed^2 * 0.5, and p(second slit open) is just 0.5, also p(flash at X | second slit open) is psi_open^2, so we have:
    3. psi_open^2 * 0.5 / (psi_open^2 * 0.5 + psi_closed^2 * 0.5)
  4. Now calculate p(Second slit was open | flash at X, photo of second slit, Beep)…
    1. Trick question, photo of second slit tells us all we need to know about whether the second slit was open or closed. This is either 1 or it’s 0.
  5. Lets start quantifying our knowledge of where the photon went under additional information… write down p(photon went through the first slit | flash at X, photo of second slit, Beep)
    1. You may see where this is going. If we know from a photo that the second slit was closed, then the photon to the extent that we allow it to have a trajectory, must have gone through the first slit.
    2. On the other hand, if we show that the shutter was open, then the photon either went through the first slit or the second slit, but we don’t know which. If we go along with Bohm, information about where it struck the detector should inform us somewhat about which slit it went through… So we calculate the wave function, and the strange trajectory of the particle. We run an Approximate Bayesian Computation type calculation. We select a photon initial position at the aperture of the laser according to our best guess of the distribution of photons at the aperture (let’s say uniform across the aperture), we propagate that photon through space according to Bohm’s equation, and we observe where it hit on the final screen. We do this in a tremendously large number of trials, taking only those photons that actually strike the screen within the range x +- epsilon where epsilon is the width of the CCD pixel or whatever. Then we calculate which fraction of these photons went through the first slit. This is p(photon went through the first slit | flash at X, both slits open).

Now, let’s examine instead the frequencies:

  1. F(flash at X | beep) = either 1 or 0, you have to ask the CCD if the flash occurred at X and find out. At the moment, at best we can put a Bayesian probability on this F. The Bayesian probability could be calculated from calculations above!
  2. F(flash at X | CCD) = {1,0} one or the other, our Bayesian probability of the frequency being one or the other collapses down to either 1, or 0. Is this “wave collapse?” no, it’s conditioning on information.
  3. Write down F(second slit was open) = {0, 1} either 0 or 1 depending on what actually happened. However we can put a Bayesian prior of 1/2, 1/2 on each because of how we arranged the flip-flop shutter.
  4. Write down F(second slit was open | record of the shutter) = a single number either 0 or 1 just look at the record of the shutter position and find out. Again, not wave collapse but it was caused by either geiger counter detected or didn’t.
  5. Write down F(second slit was open | flash at X) = {0, 1} depending on what X is… If X is the actual value of the X where the flash occurred, then = 1 otherwise = 0.

Clearly, we drive a strong wedge here between the interpretation of probability (meaning plausibility of what happened given information that we have) and frequency in repetitions. Furthermore we make a strong argument for the utility of a Bohmian viewpoint, because *it lets us calculate the probability that a quantum particle went through a particular region of space on its way to interacting at a detector*. Classically speaking, a Copenhagen interpretation says “the particle doesn’t exist, or the question of where the particle is is not meaningful until it is detected”. For Bohm, this is bunk. Conditional on knowing where the particle landed, we have a straightforward way to back out which paths are more or less likely…

Is that a desirable property of a theory? That it gives us probabilities for intermediate outcomes? It is to me. Is it desirable enough to put up with the nonlocality of Bohm’s equation? I actually think the nonlocality of his equation is pretty nifty, I’m not sure what the heck is wrong with physicists that they tend to reject that outright. It seems like a lot of them are wishy washy on this topic. I *can* understand why physicists would not want classical information traveling faster than light. But it doesn’t seem Bohm’s theory allows this anyway, so it’s not a real objection.

Nothing to see here… move along (regression discontinuity edition)

2020 January 9
by Daniel Lakeland

From Gelman’s blog, he shows yet another regression discontinuity. Apparently people have never heard of the Runge phenomenon, or the basis for why it happens. Here’s some R code, and a PDF of the output…

## regression discontinuity such as:

#generally is garbage that ignores what's essentially a correlary to
##the well known Runge phenomenon... we demonstrate here.



datasets = list()
for (i in 1:20) {
    datasets[[i]] = data.frame(x=runif(20,0,2),y=rt(20,5))

plotgraph = function(d){
    g = ggplot(d,aes(x,y)) + geom_point() + geom_smooth(data=d[d$x < 1,],method="lm") + geom_smooth(data=d[d$x >= 1,],method="lm")

graphs = lapply(datasets,plotgraph)

In almost every plot there is “something going on” at the discontinuity, either the level of the function has changed, or the slope, or both. And yet, the whole thing is random t-distributed noise…

I don’t know what that paper did to calculate its p values, but it probably wasn’t simulations like this, and it should have been.

Deborah Mayo on using Frequentist p values as pseudoBayes

2019 December 19
by Daniel Lakeland

Deborah Mayo considers using “p-values as a kind of likelihood in a quasi-Bayesian computation” as an “erroneous interpretation” of the meaning of Frequentist statistics.

My assertion is that among people doing more than just a couple t tests, particularly among people using Random Effects or Mixed Effects models, they are *already* doing a shabby sort of Bayesian modeling without taking responsibility for including real useful prior information, or doing appropriate model checking etc. It’s time to recognize that “Bayes with a flat prior and a linear predictor function” is not “Objective, Frequentist Statistics” it’s low grade passive Bayes.

Are “frequentist” models actually Frequentist?

2019 December 3
by Daniel Lakeland

I had a conversation with Christian Hennig that enlightened me about some confusion over what qualifies as Frequentist theory vs what qualifies as Bayesian theory.

Of course we’re free to create technical jargony definitions of things, but essential to my conception of Frequentism is the idea that probability is placed on observable outcomes only, so that in principle you could if necessary get a large sample of things and have the real actual frequency distribution in the form of say a histogram. Then you could say that a given parametric distribution was good enough as a model of that histogram for example. I quoted some wikipedia text which essentially said the same thing, and is consistent with various sources you might find online or in books. In general at least if this isn’t the only definition of Frequentism, it’s a common one.

The alternative, Bayesian viewpoint in my view was that you use probability distributions either on quantities that don’t vary (like say the speed of light or the gravitational acceleration in your lab during your wednesday morning experiment), or you use distributions over things that vary which are notional and have no validated connection to observed frequency, but just represent your own information about how widely things may vary.

The question “could this random number generator have produced our dataset” which is essentially the purpose to which a p value is put, is not exclusively the realm of Frequentist statistics. Every time we fit a Bayesian model we could be considered to be asking that question over and over again and using the likelihood of the data under the RNG model to answer the question, and determine whether to keep our parameter vector in the sample or not.

What this means is, a lot of people are using what you’d call Bayesian methods under this classification scheme, without really thinking about it. For example linear mixed models or “random effects” models… These are hierarchical models in which essentially the likelihood function based on a normal distribution is *very* rarely questioned against the data (ie. goodness of fit tests run) and the distributions over the “random effects” is essentially *never* tested against a repeated sampling process. This means the distribution over the random effects process is just a Bayesian distribution as it represents how likely it is to get effects of that size. It is typically taken as a normal distribution with mean 0 and standard deviation equal to essentially some maximum likelihood estimate (the standard deviation is found during the numerical fitting process I think).

In fact, there are plenty of situations where a mixed effects model is used and there are a finite few groups involved. The simplest would be 2 groups, but let’s even say 8, like in the “8 schools” example. The distribution of individual effects in the 8 schools *can not* be a normal distribution. In fact, these must be just fixed values one for each school. The notion that these 8 are a sample from a normal distribution is entirely notional, and has no direct connection to observable frequency.

The only thing these kinds of models don’t have is an explicit proper prior. And Bayes isn’t just “frequentism with priors” it’s probability as measure of credence of an idea. People are already doing Bayesian analysis every time they run a mixed effects model, they just are dodging some of the responsibility for it by hiding it under the hood of “well accepted practices codified into the lme4 library” or some such thing.

Next up: Using an ABC like process and a likelihood based on a p value, we can construct confidence intervals, showing that confidence intervals are just shitty flat posterior intervals.

RCTs are not a club, and other stories of science as rhetorical gloss

2019 November 8
by Daniel Lakeland

Increasingly, “science” is being used as club to wield in order to have power, for example the power to influence policy, or to get career advancement, or the like. The stories are legion on Andrew Gelman’s blog: Brian Wansink is an excellent example, he received millions in funding to do his “food industry” research… that didn’t have a hope of being correct.

So when my friend posted this video claiming that randomized controlled trials have proven that sugar does not cause kids to become hyperactive, I was skeptical, to say the least.

But at least it’s based on the “gold standard” of randomized controlled trials, and presented by someone with a biology background who understands and is an expert on this topic right? (actually, sure he’s an MD, but his “research focuses on the study of information technology to improve pediatric care and areas of health policy including physician malpractice…” and he’s publishing on an Econ blog. Is he perhaps a bit too credulous of this research?)

So here is where we unpack the all too common situation where meta-analysis of various randomized controlled trials is used as a club to beat people with, regardless of what those trials even say.

Let’s start with the meta-analysis linked on the YouTube video. This is Wolraich et al. 1995, a review of quite a few different somewhat related RCTs… which is hidden behind a paywall. If you manage to access it you will find that it cites 37 studies. Some of the results are summarized in this graph:

Hmm… what is up with Sarvais et al? (actually the correct spelling of her name is Saravis, not Sarvais). In any case, I looked through the cited research summarized in table 1.

You can see, this is a lot of studies of ADHD, “sugar reactors”, “Delinquent”, “aggressive”, “Prader-Willi syndrome” and other unusual populations. Furthermore the vast majority are 100% male subjects. Finally, almost all of these studies use a sweet beverage as their “control”. Unless they have an additional control for “not sweet” condition, this means they have zero chance to understand the difference between giving a kid say a sugary beverage, vs say water or a candy bar vs say a piece of whole wheat bread. In other words, sweetness may itself be the mechanism by which sugar makes kids hyperactive. If you were a biology major and proposed a mouse experiment like this to my wife she would say “where’s your negative control?” and send you back to the lab to think about how to run an experiment. Some of the papers do have a negative control.

Among the “normal” populations, number of subjects is typically less than around 25… I wasn’t going to chase down all these studies myself, so I decided to look at selected few that use “normal” population and have more than just a handful of subjects.

The ones I chased down were: Rosen et al 1988, Saravis et al 1990, and Wolraich et al. 1994

Let’s start with the last one, also by Wolraich. Like a number of these studies, Wolraich et al. 1994 is designed to investigate whether an overall diet of high sugar leads through time to an overall increase in hyperactivity… Although this is an interesting question, it is not the relevant question being discussed in the YouTube video. When a parent says their kid gets hyperactive after eating sugar, what they mean is “my kid got a candy bar from his friend and ate it, and now he’s bouncing off the walls for the next hour while I’m trying to get him to get ready for bed”. Of course there are kids with more general hyperactivity issues, but that is a more complicated issue. The YouTube video strongly suggests that sugar in any form, bolus or through time, never makes kids hyperactive ever.

The Wolraich et al 1994 article says: “The subjects and their families were placed on a different diet for each of three consecutive three-week periods. One of the three diets was high in sucrose with no artificial sweeteners, another was low in sucrose and contained aspartame, and the third was low in sucrose and contained saccharin (the placebo). So we can totally ignore this study as the design simply doesn’t address the question, and it fails to have any negative control. But, even if we don’t ignore it, what do we find… Table 4 is a summary (stupid, where’s the graph?)

It’s clear, from a Bayesian perspective, that the study was vastly underpowered to detect anything of interest. For example in the “Parent’s rating of behavior… Conduct” the numbers are 8.1+- 6.7 meaning expected random noise is almost as big as the average score… How meaningful is this rating if people ranged from say 1 to 14? Furthermore, the scores were complicated tests of a wide variety of things “administered in the mobile laboratory each week on the same day of the week and at the same time of day”. Any short term effect of a bolus of sweet foods would only be detectable if they had just given them the foods in the minutes before testing. So the results of this study are simply *irrelevant* to the main question at hand.

Let’s move to Rosen et al 1988: this study at least is designed to measure a bolus of sugar… of course it’s based on two studies with 30 preschool kids and 15 elementary school kids. They designed the study around a “high-sugar condition, a low-sugar condition, and a control aspartame condition (low in sugar but with a sweet taste).” The diet was randomized for each kid each day, and everyone was blinded as much as possible (obviously, the kids could tell if their diet was sweet or not, but weren’t told if it had sugar vs aspartame because no-one involved supposedly knew). The manipulation was by giving the kids controlled breakfast meals. The children were tested

“approximately 20-30 min following the completion of breakfast… on several measures sensitive to cognitive functioning…” as well as “each day teachers completed… [a] global rating scale… immediately preceding the child’s lunch time…to reflect the child’s behavior for the entire morning.”

The research is under-powered to really measure much of anything as well, but when they looked at global rating across all kids they found

“An analysis of the means for this measure revealed that the ratings during the high-sugar condition (M = 6.2, SD = 1.08) were only slightly, although significantly (Tukey p < .05) higher than those during the low-sugar condition (M = 5.9, SD = 1.04) (see Figure 2). The control condition (M = 6.0, SD = 1.07) was not significantly different from either the high or low condition. No other significant differences were observed for the teacher rating measures.”

So by the “rules” of standard frequentist statistics, we should determine that “high sugar” leads to hyperactivity (the scale increases with increasing hyperactivity) so this is (very weak) evidence but it tends *against* the YouTube video.

Now we get to the Saravis et al. paper. This paper fails to have a non-sweet control. The goal of the study was to determine if there was some difference between sugar vs aspartame, and not to determine the difference between “sweet” and “not sweet” diets. So right away it’s going to be weak evidence as to whether something like “feeding your child candy will cause them to be hyperactive for the next few tens of minutes”… But we’ll soldier on anyway, especially because if you look back above at the meta-analysis graph, you’ll find Saravis (misspelled Sarvais in the graph) is an utter outlier in terms of results…

The design of the Saravis study is that:

“Two experiments were conducted. The objective of the first study was to compare the effects of consuming a large dose of aspartame or sodium cyclamate…with carbohydrate on measures of learning, behavior, and mood in children. The treatment consisted of an ice slurry … of unsweetened strawberry Kool-Aid containing carbohydrate (as Polycose …) plus either the test dose of aspartame or the equivalent in sweetness as sodium cyclamate…”

“The objective of the second study was to compare the effects of aspartame with a common nutritive sweetener (sucrose) on the same variables. In this experiment, the treatments consisted of … Kool-Aid, plus either 1.75 g/kg of sucrose or the equivalent sweetness of aspartame. The drink with sucrose provided the same amount of carbohydrate as in the first experiment.”

So the study is always giving the kids a sweet drink, and only the second study even varies the amount of carbohydrate in the dose… The studies were 20 children 10 boys and 10 girls ages 9-10…We’ll soldier on…

Guess when they gave the high sweetness drink? Well they claim it was at 10:30, but if you know anything about a pre-school, you’d know that it takes time to pour out the drinks, get the kids in line, etc etc. So you can guess that at 10:25 the kids were aware that they were about to get a bolus of sweetness… and of course their activity level is very high right at that point in time, and continuing.

So the study is not designed to test the main question of interest “if you give kids candy/sweets do they go bonkers” but to the extent that they did give kids sweet stuff… they did go bonkers afterwards, independent of what kind of sweet stuff… Because there was “no difference” between the two kinds of sweet syrupy Kool-Aid, this is taken by the YouTube video as evidence that “sugar doesn’t cause hyperactivity”, essentially the opposite of the conclusion I draw, which is that sugar, and other super-sweet foods, given in a bolus, may very well make kids crazy, but the study actually can’t really determine that… Maybe just getting kids in line and giving them anything makes them crazy… who knows. Welcome to modern science, where a study completely incapable of determining anything of use which nevertheless is entirely consistent with out prior guess, and under-powered to determine anything precisely, is taken as definitive proof that our prior expectation is completely false. RCT as billy-club, wielded by econo-physician to show that stuff “everyone knows” is nevertheless wrong and only heroic econo-physician can save us from irrationality.