What Science News has to say on statistics in science

2010 March 14
by Daniel Lakeland

Odds Are, It’s Wrong.

Well, something like that. Here’s my take on this topic, in bullet form:

  • Most scientists are stuck on “hypothesis testing” which is a very poor form of statistical modeling.
  • Please don’t ever report a p value. It just doesn’t mean anything much of interest.
  • Bayesian interpretations of statistics are better, but there is no magic bullet.
  • Models, Models, Models, Models.

Basically the usual scientific study goes like this, especially in biology or social sciences where variability is large and generally not entirely under the experimenters control.

  1. Collect some data that is more or less vaguely related to something of interest.
  2. Look at the data and decide what your question is. Your question can be in the form of a t-test of difference of means, or a linear model involving several measurements, or a linear model on some transformed version of your data (ie. logistic regression).
  3. Plug said question into canned statistical routine in Excel or some such software.
  4. Hope for small p value.
  5. Hopefully report a “significant” result, and apply for more funding.

Contrast this with what I would call the “gold standard for scientific method”:

  1. Collect some data related to your topic of interest.
  2. Look at the data and use it to formulate a general class of possible models for the process of interest.
  3. Using your preliminary data, fit a posterior distribution for the values of the parameters in your model.
  4. Compare the fit of your model to your data, and revisit model specification until your class of models is sufficiently general that it encompasses all effects you believe are worth considering.
  5. Design a study which specifically addresses the various effects your model would predict.
  6. Using the data from the study, and your posterior distribution from the preliminary study, refit your model parameters and check for model predictive power on a reserved subset of your study data.
  7. If the model has predictive power and predicts effects that are nontrivial, publish your model and the raw data you used to fit it in a form that someone can use to replicate the calculations or to use your data as input to an alternative model.

Basically no-one in science meets the gold standard at all levels, but some get much closer than others. In particular, the difference between the gold standard and the typical process is in the specification of the model, the method of evaluating the model’s fit, and the fact that the experimental design is specifically tailored to investigation of the range of model predictions. A p value tells you nothing compared to confidence intervals (or high probability density intervals in bayesian terms) and covariance of the parameters in your model. Those can be used to determine how big the effects will be from manipulating the experimentally controllable variables, and that is in general what we want to know.

I would like to add that proper nondimensionalization of the model can lead to much better specified models and much more interpretable results. However when it comes to assessing practical significance, the scale matters. Reducing the risk of a certain cancer by 40% sounds great, but if it’s a cancer that affects 1 in 1 million people it has much less value than if its a cancer that affects 1 in 100 people for example.

Comments are closed.