Updated state by state graphs with bug fix

2020 May 17
by Daniel Lakeland

I discovered that the method I was using to smooth the cases per day had a bug.

The shapes of the case-per-day function were right, but the overall scale was reduced. Basically what i was doing was convolving by the derivative of a smoothing kernel… But when you calculate the derivative its (f(x+dx)-f(x-dx))/(2dx) that you’re trying to calculate, so when you’re averaging across multiple sizes of dx you need to take that into account… fixed. Now the grey points are the raw data, the black line is the short-term smoothed data, and the blue line is the ggplot smoother.

I wasn’t using this method for doing deaths per day, though maybe I should be now… in any case here’s the current versions.

Updated state by state graphs

2020 May 7
by Daniel Lakeland

Here’s the current status…

State by state graphs of COVID-19 data (from covidtracking.com)

2020 April 25
by Daniel Lakeland

I’ve got a script that grabs data from covidtracking.com and generates several pdfs that give an overview of the pandemic situation one graph per state… I’ll try to update the graphs about weekly. But here are the ones as of today.

Cryptographically Distributed COVID contact tracing through WiFi ad-hoc networking

2020 April 10
by Daniel Lakeland

This is a quick note to try to sketch out an idea that I thought up about how to have people cooperatively determine if they have come in contact with a COVID patient. Here’s the basic idea.

Every Android or iPhone generates a random UUID. Then, when walking around, periodically the phones beacon out a peer-to-peer SSID on their WiFi radios called something like COVID-CONTACT-{UUID}. Everyone’s phones scans the surroundings for stations, and whatever stations they hear, they record the UUID of the phone.

Now… at the end of the day, each phone uploads a one-way cryptographic hash of their own UUID, and the UUIDs that they contacted today.

If a person tests COVID positive, they upload a record of their cryptographic hash, and the fact that they tested positive.

Now, every day you look at all the contacts you’ve contacted in the last ~ 10 days, you hash those, and you see if any of those hashes report being COVID positive. Also, you look up all the COVID positives in the last 10 days, and you see if they report having contacted YOUR hash…

Now, there’s probably some subtlety to this which requires working out by people who are more crypto nerdy than I am, but the dataset is such that you can’t determine whether A contacted B unless you know the UUID of *both* parties. Since the UUID itself is stored internal to the app and never sent to anyone else, basically the UUID itself is a secret, and it’s only possible to determine if A contacted B or vice versa if you are in fact either A or B.

Of course, you could just try EVERY UUID that’s possible… Good luck with that, since there are ~ 2^128 = 340282366920938463463374607431768211456 of them. If you tried 1 Million per second, it’d take 10^25 years to try them all.

So, is this a viable non-invasive contact tracing strategy? What am I missing?

Grocery handling, good bad or ugly?

2020 April 1
by Daniel Lakeland

Apparently this guy’s video is controversial:

I’m going to come right out and say this is a great video, it shows people how to handle objects in a way that minimizes transmission of virus from surfaces. Apparently the controversial part though is where he dumps his oranges in soapy water? Are you kidding me? Everyone should be washing their produce at all times people! Have you ever heard of e-coli?

A frequently heard thing in the “anti” group is something along the lines of “there is zero evidence that xyz”, such as “there is zero evidence that food packaging is a significant source of infection” or “there is zero evidence that washing your food in soapy water is good for you” or whatever. This is typical “Null Hypothesis Significance Testing” type logic… Until we have collected a bunch of data rejecting the “null hypothesis” that “everything is just fine” then we should just “act as if everything is just fine”. Another way to put this is “until enough people have died, you shouldn’t take precautions to protect yourself”. Put that way it’s clearly UTTERLY irresponsible to “debunk” this video using that logic.

What we KNOW is that viruses are particles, essentially complex chemicals, which sit in droplets, which can be viable after floating in the air for 3 hours, which can settle out onto cardboard and be viable for 24 hours, and which can be viable for 3 days on plastic and steel. Guess what your groceries come in? Plastic bags, cardboard boxes, steel cans, plastic jars…

The assay used in the NIH study that established those timelines was to actually elute (wash) the virus off the surface and then infect cells in a dish with it and see how many were infected. It wasn’t just detecting the virus was there, but actually showing that it was active and viable.

So, there’s your evidence. There is *direct* laboratory evidence that the virus *can* be transmitted off the surfaces into cells and infect them.

Whether this is a significant source of infection or not is more or less irrelevant. How do you make a decision as to whether you should spend ~ 1hr every 2 weeks cleaning all your groceries?

Here’s the Bayesian Decision Theory:

Suppose two actions are possible: 1) do nothing, or 2) handle your groceries carefully and wash your fruits and vegetables in dish-soapy water

Costs of (1): probability p0 of getting infected from contaminated surface. We don’t know what p0 is, but leave it as a symbolic quantity for the moment. Let’s just use 0.5% chance of dying if you’re infected as the dominant problem, and a “statistical value of a life” as on the order of 10M dollars… so p0*.005*10000000 = 50000*p0

Cost of (2): probability of getting infected from contaminated surface reduced to p0/100000 perhaps, the same 0.5% chance of dying if you’re infected, plus 1 hr of cleaning time. So cost is 0.5*p0 + w*1 where w is an “hourly wage”. Suppose you are willing to work for a median type wage, 50k/yr. This is 25$/hr. So, what does the probability p0 need to be to “break even”? Ignoring negligible quantities 0.5*p0, we have 50000*p0 = 25 so p0 = .0005. If you think there’s something like a .0005 chance you could transmit virus from your grocery items to your face by “doing nothing” then YOU SHOULD BE CAREFUL and wash your items. For me, I’ll spend some time quarantining my groceries, and washing my produce… I also find it keeps the produce from spoiling and hence lasts longer in storage, so that should go into the “plus” side as well.

As to what to wash your produce with. I’m using sudsy water from dye and fragrance free dish soap (main ingredients: Water, Sodium Lauryl Sulfate…). I’m washing my fruit and veg, and then rinsing it thoroughly. The quantity of soap I’m ingesting is substantially the same as if I hand washed a glass, rinsed it, and then filled it with water and drank it… It’s substantially less than you get from brushing your teeth with a typical toothpaste. If you are afraid of washing your dishes with soap, or of brushing your teeth, then by all means don’t wash your fruit with soap either… For the rest of us, do a good job rinsing just like you’d rinse your glasses or bowls before putting food in them.

Confusion about coronavirus testing and the role of testing capacity

2020 March 30
by Daniel Lakeland

Here’s some code to simulate a process whereby we saturate testing capacity… First the graphs:

Confirmed cases (blue) follows the real cases (red) so long as the cases per day are below the maximum… once we saturate, the green line increases linearly, and so does the blue line…
Green line (tests) parallels the blue (positive tests), as we saturate

t = seq(1,40)
realcases = 100*exp(t/4)
realincrement = diff(c(0,realcases))

testseekers = rnorm(NROW(realincrement),4,.25)*realincrement

maxtests = 20000

## now assume that you test *up to* 20k people. if more people are
## seeking tests, you test a random subset of the seekers
## getting a binomial count of positives for the given frequency

ntests = rep(0,NROW(t));
ntests[1] = 100;
confinc = rep(0,NROW(t));
confinc[1] = 100;
for(i in 2:(NROW(t)-1)){
    if(testseekers[i] < maxtests){
        confinc[i] = realincrement[i]
        ntests[i] = testseekers[i]
    else if(testseekers[i] > maxtests){
        confinc[i] = min(realincrement[i],rbinom(1,maxtests,realincrement/testseekers))
        ntests[i] = maxtests

cumconf = cumsum(confinc)
cumtests = cumsum(ntests)

ggplot(data.frame(t=t,conf=cumconf,nt=cumtests,real=realcases))+geom_line(aes(t,cumconf),color="blue")  + geom_line(aes(t,nt),color="green")+ geom_line(aes(t,real),color="red") +coord_cartesian(xlim=c(0,35),ylim=c(0,400000));

ggplot(data.frame(t=t,conf=cumconf,nt=cumtests,real=realcases))+geom_line(aes(t,log(cumconf)),color="blue") + geom_line(aes(t,log(nt)),color="green")+ geom_line(aes(t,log(real)),color="red") +coord_cartesian(xlim=c(0,30),ylim=c(0,log(400000)));

The longer term outlook…

2020 March 10
by Daniel Lakeland

Coming out the other end of this whole COVID-19 thing… how do we do a good job of sustaining social distancing, and then returning sanely to productivity? The “flatten the curve” idea extends the amount of time one needs to be in “lockdown” but ultimately reduces deaths and severe morbidity… That’s good, but it starts to run into the “how long can we hole up?” question. If things go crazy through the roof, like in China, the duration is shorter. Data here shows from “oh shit” to relatively small per day caseload was about 20 days in china.
That’s a bad thing, because that represents the really “peaked” shape that overwhelms healthcare facilities. Many people died who otherwise might not have…
But if we make that slower, then also the peak occurs later, and the duration is longer, we might need, say 80 days of rather intense social distancing to make that happen. If we figure lockdowns are going to start now and build up through the next 10 days (it’s already something WaPo and The Atlantic and etc are saying)… And then we need 80 days after that… you’re talking 90 days which is 3 months, and puts us starting to return to work around June 1.

Now let’s talk food supply. Unlike China, this virus is spreading country-wide. It’s not contained to a particular place. So mobilizing the national guard to bring food from the midwest to WA because people in the midwest are ok… is not a possibility. How do we feed our country for 80 days without people having to be in contact with each other? We need food delivery systems.

Fortunately, as people get the virus and then recover, they should be immune for at least some period of time. Recovery to the point that they’re not shedding the virus is however probably 30 days? Just a guess, we’ll have to see with serology and PCR combo tests (to test that someone had the virus at some point, and doesn’t shed it now).

This doesn’t help us a lot. We have to do 90 days of relative isolation, and during the first 30 days people are getting the thing and then over the next 30 days those early people are recovering… by the time we hit 90 days, if you haven’t gotten it, you’re running pretty lean on food and things even if you’re well stocked now (and most people really aren’t). Obviously we’ll need to distribute food throughout the 90 days. This is going to require coordination from govt I believe, otherwise we’ll have sick people out there handling food… not good.

Everything you need to know about what to do about Coronavirus

2020 March 9
by Daniel Lakeland

You need to stop interacting with people. And I’m not joking about this.

Here’s the facts out of Italy: about 10% of tested positive cases require ICU ventilation. The death rate for people under age 65 is probably only ~ 1% **if you get the ventilators to the 10% needing ventilation**… If you overwhelm the hospitals, the death rate will go to ~10% which is on the order of magnitude of about 10x as bad as pandemic influenza in 1918.

The current trending idea is #flattenthecurve to describe to people HOW IMPORTANT it is to start *NOW* avoiding the spread of the disease. This avoidance of overloading the infrastructure is a core idea in Civil Engineering (my PhD is in CE).

Reducing the spread of the disease is not important just because fewer people will eventually get it (though that is probably true) but because the peak number of people who need ventilators and other intensive type care will be lower, so that fatality rates can stay low. If all the ICU beds are full, and 300 patients show up needing ICU today… all 300 patients will die. Since 10% of cases may need ventilators, it’s a serious situation.

Does social distancing, closing schools, etc work? Evidence out of 1918 says HELL YES: Unfortunately servers are getting swamped, so the best way for me to link you to this info is via twitter, who will probably stand up to the pounding.


So, what do you need to do? TODAY make plans to not be at work by the end of the week. Why? Because the virus is doubling the number of symptomatic verified cases outside china about every 2-4 days, let’s call it 3 days. And, btw it takes 5 days to onset of symptoms and for many people ~ 10 or 15 days before they say “hey I need to go to the hospital” (though for the elderly… it can be like 1hr after onset of fever). So, whatever’s going on in a hospital near you… it’s maybe what was the case 3 or 4 doubling periods ago, so today it’s on the order of ~ 10x worse than that. 10 days from now, it will be 100x worse already, but that will show up at the hospital about 20 days from now.

Early, proactive and significant reduction in interaction with other people WORKS and is one of the only things we can do. So we WILL be doing it. If we wait, we’ll be doing it AND have a massive tragedy. If we start now, we’ll be doing it but have less of a massive tragedy. The boulder is rolling down the hill, we can start walking off the path now, or get hit.

Back of the Envelope Cost-Benefit on pulling your kids from school

2020 March 4
by Daniel Lakeland

It is clear that COVID is spreading in communities in Northern California, and Washington. The time until it is confirmed to be spreading in SoCal is probably a few days. It will always be confirmed *after the fact*, which means it is probably spreading in the SoCal community at the moment, though in the early stages. Outside China cases are increasing exponentially with a doubling time of about 5 days +- which you can read off the graph at several web-sites such as the linked map site (click logarithmic graph on the lower right graph, read off the yellow dots for outside China spread).

I personally view it as inevitable that PUSD will decide to close schools. I don’t know what their timeline will be, but as these are typically committee decisions and there is risk either way (too early vs too late) I expect them to be delayed until the choice becomes obvious. On a doubling every 5 days trajectory, that probably means somewhere in the 10 to 15 to 20 days from now (which would mean somewhere around 800 to 3000 cases in the US). Spring break being Mar 30, I could imagine they’ll try to stay open til the 25th or so, and then not reopen after spring break. Though more pro-active decision making might mean closure in the next 5-10 days or so now that Pasadena has declared state of emergency. All this is more or less my own opinion based on reading the growth charts, and seeing the responses from large organizations canceling conferences and things.

Now, at what point is it actually logical to pull your kids from school? I’m going to do this just for a family with a stay at home parent, because the calculation for lost days of work is much harder and depends on a lot of factors. We can back of the envelope calculate this as follows: Costs of lost days of education is on the order of a couple hundred dollars a day. Let’s say $20/hr x 6hr/day = $120/day. If the stay at home parent can provide some of this education, the cost might drop to say $50/day…

Now, what’s the costs associated with sickness? Let’s just do the calculation of one parent gets seriously ill and dies. For a child in elementary school let’s just put this around say $10M.

Now, what’s the chance of death if you have definite exposure? It’ll be something like 100% chance of getting sick and 0.5% chance of death (assuming parent doesn’t have underlying conditions and isn’t unusually old)… So the expected cost is $10M * 0.005 = 50000… So by this logic, you should be willing to avoid that by pulling your kids from school about 1000 days early. Of course, it’s way too late to be 1000 days early, so basically you should pull your kids from school TODAY.

Now, suppose you have a job making $100k/yr, and you just get cut off from that job. That’s $385/day (which you don’t take home all of, but whatever). So if you add $50/day to that for educational loss, you should be willing to pull your kids about 115 days early. It’s also too late for that… So again, pull your kids TODAY.

Any way I back of the envelope this, it’s time to pull your kids from school… I don’t see a big enough flaw in all these calculations that would lead to waiting another 20 days.

Bayesian Decision Theory and Coronavirus

2020 March 4
by Daniel Lakeland

You’d have to seriously be living under rock to not know about Coronavirus… But not matter how much you know about it at the moment, you probably don’t really know what we should do about it as a society. I mean, what are the various factors involved, should we close schools, churches, sporting events… what to do at nursing homes? Who should go to work and who should stay home? How would they afford it?

This is because all those questions are actually answerable to some extent (probabilistically at least) but there isn’t a group tasked with doing the analysis. It would be a good idea. Like, what the heck is the WHO doing if not at least staffing say 10 people who develop disease modeling software, and have several racks of computers to run MonteCarlo scenarios?

Well, whatever, if they were going to hire some people to do this stuff, what does the analysis look like? Here’s the general idea:

  1. Describe the factors that are associated with costs…
    1. Loss of Quality Adjusted Life Years (QALYs). This is the cost associate directly with “you don’t feel well for N days” all the way up to early death… The direct real-world cost of loss of healthy time.
    2. Loss of productivity: people who are sick don’t provide services to other people, they don’t produce goods, etc.
    3. Cost of treatment: people who are sick require other people to take care of them. They require medicines. Etc etc.
  2. Describe the factors associate with reduction of cost, or creation of benefits (or increasing costs above what they otherwise might be):
    1. Treatment of a person may shorten their sickness time.
    2. Treatment of a person may avoid them spreading the disease.
    3. Quarantine or Social Distancing may reduce spreading rate.
    4. Fast spreading rate may result in overwhelming local medical care, resulting in lack of care and much worse symptoms even death.

Once we put all these different factors into a model of the costs of any given scenario, we have the structure for a decision, but we still don’t know what the right values are for the parameters. For example, what’s the right cost of loss of worker time in India, how about in Vietnam… in Canada? How about the cost of health care, or the number of hospital beds etc? One needs to collect data, and estimate quantities. Some quantities will need to be estimated during the outbreak, like the growth rate of the number of cases in each country and the effect on this growth rate of different kinds of responses… Some numbers we will never know particularly accurately, but we will need to “borrow strength” from estimates across nearby regions, or similar cultures.

So, after specifying all that… we need to run a tremendous number of simulations, using the posterior distribution of the estimated quantities, predict the costs of different responses. From this we will get a variety of distributions over costs for different scenarios, and can calculate what seems to be the best response choice. If we make that choice, we continue to collect data and figure out what is going on, going forward, and continue to estimate what is the best choice… possibly changing the response through time as things become clearer whether they work. There’s some reason to think that that we should try different responses in different places, so as to collect information about what might work, and then switch people to the apparently most effective thing as time goes on.