The idea is this, about 1914 or so Emile Borel calculated that a perturbation corresponding to a single gram of matter moving in a certain way in a star several lightyears away would perturb the individual trajectories of the molecules of an ideal gas in a lab in such a way as to cause those trajectories to diverge from the ones you would calculate without the perturbation so that after a few seconds any given molecule would have diverged dramatically from your calculated trajectory. (To find citations on this I relied on this blog which ultimately says that the basic idea was described in Leon Brillouin’s 1964 book “Scientific Uncertainty and Information” (pg 125) where he cites Emile Borel’s 1914 book “Introduction géométrique a quelques théories physiques” (p 94) for Borel’s calculation.)

This is a manifestation of the basic idea of Sensitive Dependence on Initial Conditions (aka the Butterfly Effect). A measure of this phenomenon is called the Lyapunov Exponent. Basically for small times after the initial point where your model begins to track the molecules, the error in your calculation grows exponentially like $$exp(\lambda t)$$ for some $$\lambda$$. Exponential growth is of course very fast, so whenever $$\lambda$$ is positive then your calculation is useless after some amount of time because there is always some error in your calculation and it rapidly blows up.

What does this have to do with Null Hypothesis Significance Testing (NHST)? Well, a common way to utilize this “paradigm” is to collect some data, experimental or otherwise, do some analysis, test a “Null Hypothesis” that the effect of X on Y is zero on average, and then find that the effect is either “statistically significant” or “non-statistically significant”. This is taken as evidence that “X affects Y” or “X does not affect Y”.

What the Borel calculation tells us is “everything affects everything else to some extent” and so logically in this NHST statistics there are only “true positives” and “false negatives”. If you find a non-significant effect it’s just because you didn’t collect enough data.

Rethinking this paradigm logically, we should really be asking questions like “how big is the effect of X on Y” and “what other effects are there, and how big are their contributions, so that we can determine whether we need to take X into account in order to make useful description of Y”

Rethinking things in this manner immediately leads you to a Bayesian viewpoint on statistics: find out what regions of X effect space are plausible given our model and our data, and then determine whether we can safely neglect X relative to the other factors in our model.