Updated state by state graphs with bug fix

2020 May 17
by Daniel Lakeland

I discovered that the method I was using to smooth the cases per day had a bug.

The shapes of the case-per-day function were right, but the overall scale was reduced. Basically what i was doing was convolving by the derivative of a smoothing kernel… But when you calculate the derivative its (f(x+dx)-f(x-dx))/(2dx) that you’re trying to calculate, so when you’re averaging across multiple sizes of dx you need to take that into account… fixed. Now the grey points are the raw data, the black line is the short-term smoothed data, and the blue line is the ggplot smoother.

I wasn’t using this method for doing deaths per day, though maybe I should be now… in any case here’s the current versions.

4 Responses
  1. Jim Moore permalink
    May 18, 2020

    Thanks for putting these out there and updating them. Think it is interesting to visualize the state-by-state differences in trends. I think there is some more that you could be doing looking at the testing data that would be very interesting. The absolute numbers are interesting in themselves, but I think more interesting for a decision making standpoint would be to look at what is happening to the fraction of positive tests through time, especially in conjunction with the rise or fall in daily cases. Say for example cases are going up, but if the fraction of new positive tests is rising, it tells us a different story than if the fraction of tests is falling – especially if we are looking at a state that has recently changed quarantine protocols. The former is a cause for concern – a possible re-acceleration of transmission, whereas the latter is likely just do to the additional availability of tests and is just more tests = more positives, but not likely any increase in the true prevalence of the disease.

    • Daniel Lakeland
      May 18, 2020

      when you see a discontinuous change in testing, and combined with that a discontinuous change in cases, you can probably guess that they’ve started testing more people and now you’re just getting more ascertainment. But I don’t think this is a strong issue for most states with significant outbreaks.

      Testing is a complicated subject for sure, but as long as policy hasn’t changed dramatically recently, I think cases probably mirror the shape of the infection. Most places aren’t changing their testing policy every 5 to 10 days… and you can get a decent sense of the growth over that period.

  2. Mendel permalink
    May 25, 2020

    Are the areas under the graphs constant, i.e. would I get the same number of total cases if I sumed up the data or integrated the black or the blue line?
    A rolling average does have that property.

    • Daniel Lakeland
      May 25, 2020

      Yes, the bug fix was to calculate the rolling average properly. Actually I think I further “fixed” it after posting these.

      Ultimately the issue was I was doing a rolling average, and then taking adjacent differences of the averaging kernel… but R didn’t know what to do at the edges of the averaging kernel so it was dropping the edge cases from the new differentiation kernel… So I explicitly added some zeros to the averaging kernel and then adjacent differences of it worked right.

      for example, suppose you had the averaging kernel [0.25,0.5,0.25] when you tell R to do -diff of this, it returns [-0.25,0.25] but the kernel is the same as the kernel [0,0.25,0.5,0.25,0] and the -diff of that would be [-.25,-.25,.25,.25] which is what you really want.

      I think I was using different averaging kernels, but you get the basic idea. Bugs.

Comments are closed.