Gelman posts about a paper in which social scientists run a quadratic regression and discover that it produces a U shape!!! Amazing. The story is that too much talent on your sports team will lead to poorer performance. Further evidence that understanding function approximation and basis expansion is seriously flawed among many researchers.

Uri and Leif weren't able to get the data, but were able to get the original researchers to run a test of a hypothesis. The idea is they take their quadratic regression, find the maximum or minimum point, and split the data at that point, running two linear regressions, and testing whether the sign of the slope is different. This is a bit better than assuming the quadratic means there really is a U shape, but basically still wrong. The problem is NHST of a slope against zero isn't the right way to figure out whether you have a U shape. In fact NHST isn't the right way to do much of anything (except test the output of PRNG algorithms).

I hear you asking: Ok Dan, well then, how should we do this right?

First off, Uri and Leif have some data on competition vs patents and claim that this is a good testbed for their method. I disagree because I don't think number of patents is a good measure of "creativity" and there are serious problems with that analysis even before you have the data. The sports data on the other hand is pretty good conceptually, but the authors didn't provide the data so I can't just show you the kinds of stuff you ought to do...

So I'll have to resort to just telling you. The first thing is that you need a system which is sufficiently flexible to fit whatever nonlinear shape your data "really" has. With the sports data the x axis is logically limited to the range [0,1] (0 to 100% of the team is high-talent). This is a bounded interval, and polynomials are a complete basis for continuous functions on a bounded interval. That means, with sufficiently high order polynomials we can fit any nonlinear shape. However, with noisy measurements, we also have a problem of potentially over-fitting. Furthermore, we'll do better for mathematical reasons if we use orthogonal polynomials.

But, a complete orthogonal basis isn't necessarily the best way to go, especially in a Bayesian analysis. For example, we have the idea from sports fans who haven't seen the data that we should get a diminishing returns, horizontal asymptote type result. So, perhaps we should include such a family in our analysis. Unfortunately, I couldn't get Chebyshev polynomials into Stan because the Stan authors didn't really understand them even though someone DID do the work to include them. So, let's set up a model for y vs x using regular old non-orthogonal polynomial regression.

We're going to expect some correlations between the various coefficients, because we don't have an orthogonal system, but Stan is pretty good at working that out, in the end all we care about are the function shapes, not the coefficients anyway. The second term represents a sigmoid function which has horizontal asymptotes at 0 and h and which has a transition between them in the region of $x_0$. The polynomial represents a general purpose correction to this basic shape. It would be even better to have Chebyshev polynomials and to give strong priors to the coefficients $a_i$ for reasons I won't go into (but they do involve a less-correlated posterior distribution for the coefficients, easier to sample from). With the sigmoid function doing a reasonable job of representing our core expectation, we can at least put priors on the $a_i$ that are zero-mean, as we wouldn't be too surprised to find that large polynomial corrections aren't necessary.

Ok, so now, pry the data out of the researchers fingers and fit the model in Stan. Then, take a sample of 100 of the coefficient vectors and plot the curves in a spaghetti plot. Now, calculate the derivative of this function (if you haven't done calculus in a while that's ok, use Maxima or ask Wolfram Alpha or something similar). In a separate plot, spaghetti plot the derivative. It'll be noisier, as the derivative is less constrained by the data, but we've regularized our result by truncating at 5th order polynomials and we can regularize further by putting strong priors on the $a_i$. Spaghetti plotting the derivative, does it almost always have a positive value on the left and then dive down to a negative value on the right?

If you have those plots of the curve and its derivative, and they really do produce a fairly consistent change in the sign of the derivative, THEN you have reasonable evidence for a transition to negative slope. The posterior probability of a curve shape can be directly assessed in these plots. There's no need to "test" your hypothesis using an NHST test that is appropriate for assessing whether the coefficients are the output of an RNG. But, if you do find that out of the 100 spaghetti derivative plots all but 1 to 5 of them have entirely negative values at the high end, then, yes you have relatively strong evidence for your decreasing performance at the high end.

The fact is, the posterior distribution is what summarizes our uncertainty. A binary decision about rejecting a linear regression on the upper end is NOT the same thing for the same reasons that Uri and Leif give about why the quadratic isn't good as an indicator of a U shape.

One Response leave one →
1. June 28, 2016

An interesting idea, related to my posts on declarative models, would be to regularize this fit by putting a prior on say the mean squared second derivative. Approximate the integral((d^2f(x)/dx^2)^2,0,1) using a sum at say 10 points between 0 and 1 and place an exponential or maybe gamma prior on it, enforcing a soft preference for less wiggly functions. Gamma is good because it is expected to have some concavity, but it shouldn't change direction many times over the interval 0,1 even though with a 5th order polynomial and the sigmoid together, it could fit a fairly wiggly function.