Measured and imputed fall times from Bayesian model and predictions via ODE with drag

Well, the results are in, and we have a nice set of preliminary predictions for the falling ball problem (a more massive MCMC run would be in order next). They also show some very nice advantages to a Bayesian analysis of the problem. In the above graph are measured fall times (black points) predicted fall times without drag (black curve), a linear fit to the raw data (grey line/area), corrected fall times including imputed timing bias (red points) and predicted fall times from the ODE model for the initial conditions and the aerodynamic radius. Bayesian uncertainty not shown (I used the mean values from the MCMC runs).

One of the nice aspects of this analysis is how prior information helped determine which sources of uncertainty were important. For example, we could obviously fit the original data pretty well if we had no drag and a little extra gravity (imagine the black curve a little lower). In fact fitting the model

> linmod1 <- lm(ts ~ 0+I(sqrt(2*h0cm/100)),data=drops);
> 1/coef(linmod1)^2
I(sqrt(2 * h0cm/100))
12.15669

But gravitational acceleration of 12.2 m/s^2 is completely out of the question, we know thanks to others work that g in LA is within about 0.01 of 9.796 and we know from dropping something denser than a paper ball that the paper ball takes longer due to drag, maybe around .1 or .2 seconds longer for these heights. We could include this timing bias into the model easily:

> linmod2 <- lm(ts ~ I(sqrt(2*h0cm/100)),data=drops);
> 1/coef(linmod2)[2]^2;
I(sqrt(2 * h0cm/100))
7.101245
> coef(linmod2)[1]
(Intercept)
-0.1371857

But although a timing bias of 0.14 seconds is not unreasonable (the Bayesian method predicts 0.275 s), g=7.1 is also completely out of the question. This reflects the fact that the shape of the point cloud isn’t like a $$\sqrt{2h_0/g}$$ curve because of drag (it’s more linear because we’re transitioning to a constant terminal velocity by the time we hit the ground).

To include the ODE predictions in the least squares type model we need to first run the ODE predictor for each drop. To do that, we need some value for the aerodynamic radius, and we need to know the initial conditions, neither of which do we actually know. Oh sure, we could assume $$r$$ is pretty close to half the average measured diameter for example, but the linear model needs just one r value and we are not comfortable saying that r is any definite quantity based on the variability in size and shape, so we’re going to need to do “sensitivity analysis” on r. Also, we know that there’s some kind of uncertainty in the initial height, oh sure I tried to hold it very close to the height requested, but the shape isn’t a sphere with a precisely defined edge, it’s a crumpled ball of paper. Plus there’s the $$\rho$$ and $$\mu$$ values for air, fixed during the experiments, but not known exactly due to temperature and humidity and soforth. So again, we are going to have to run “sensitivity” analysis on all of these things. How will we choose the heights and radii (and air properties) to use in our analysis? Will we, as Gelman rails against, invert a hypothesis test, and accept any r,h pair which we can not reject? That will probably be a pretty big range considering that our above analysis is willing to put g between 7.1 and 12!

The Bayesian analysis goes one step further than the frequentist analysis in allowing us to specify that some values are more reasonable than others, namely g tightly coupled around 9.796 and $$\mu,\rho$$ somewhat less tightly coupled around their nominal values, so that we expect more uncertainty in $$r,h$$. Furthermore, if we like we could couple these values together. A ball with a more irregular shape will be harder to hold at an exact height for example.

So how about calibration? One of the complaints from Frequentists is that predictions from Bayesian analysis do not necessarily have well calibrated probabilities for reproducible experiments. For example if I took my fitted model and predicted fall times for each ball for a new fall height, and then ran the new experiment a bunch of times to get a lot of data, would the distribution of measured values be pretty close to the distribution of predictions from the Bayesian model? Well, it’s hard to say without running the experiment. But my intuition says that after all the sensitivity analysis and inversion of hypothesis tests and soforth, any Frequentist confidence intervals would be pretty large, what with having to include different g values and r  values and soforth. And if you are including different values for $$g,r,h_0$$ but not assigning them prior probabilities, what do you do for confidence intervals? You’ll get one for every different value of $$g,r,h_0$$ so it’s not clear which one to evaluate for advertised coverage.

All this is to say, I use Bayesian methods because they answer the question that I tend to want answered. Namely, if I can make some reasonable assumption (to me based on my general knowledge) then Bayes theorem will give me the resulting weights to be placed on different values for quantities that I do not know exactly (things like aerodynamic radius, fall height, air density, timing bias, timing variance etc).

One question that I don’t know how to answer is how to deal with calibration issues in the absence of an enormous validation sample. But this is a problem I have no matter what. When the sample size is enormous we can be pretty sure about our predictions. When I have 30 drops for 10 balls and I predict the future from that, I’m only going to know how well calibrated my predicted fall time distribution is after I go and do a whole bunch more drops. Of course then I’d want to update my model with those additional data points, and I’d have to collect even more data to see how calibrated I am… ad nauseum.