On models of the stopping process, informativeness and uninformativeness

2017 June 24
by Daniel Lakeland

I had a great conversation with Carlos Ungil over at Andrew Gelman's blog where we delved deep into some confusions about stopping rules and their relevance to Bayesian Inference. So here I'm going to try to lay out what it is I discovered through that conversation, and I'm going to do it in the context of Cox/Jaynes probability with explicit background knowledge, translating from what I consider the much more nebulous terms like "deterministic" or "random".

Standard Textbook Stopping Problem:

First off. Let's talk about the "standard textbook" type stopping rule: "Flip a bernoulli coin with a constant p until you see 3 heads in a row and then stop" Suppose you get 8 observations HTTHTHHH. Now, the rule is completely known to you, and so you can determine immediately by looking at the data whether a person following the rule will stop or not. If someone hands you this data and say "given what you know about the rule, will they stop?" you will say P( STOP_HERE | Data, Rule) = 1. *for you* the rule is deterministic thanks to the textbook description. Under this scenario, knowing that they stopped at N=8 does not add anything that you didn't already know. Therefore it can't add any information to the inference. Basically

P(Parameters | Data, Rule, STOP) = P(Parameters | Data,Rule)

Standard Real World Stopping problem:

In the real world, the reason why people stop is rarely so clearly known to you. The experimenter tells you "Our collaborators ran a few trials and read the protocols from another lab, and tried this thing out, and it seemed to work, and so we tried it in our lab, and after collecting 8 samples we saw that the results were consistent with what we'd been told to expect, and so we stopped."

Or a survey sample of a population comes with a dataset of answers from 120 people, and a description of the protocol: "we ran a small preliminary study to test out our questions, and found in the preliminary study that the phrasing of the questions didn't seem to matter, and so we used the observed variability in this previous study to determine that a sample of 120 people should give a cost effective dataset for answering the questions we asked, we then sampled a fixed 120 people" but... because the preliminary study was based on a mixture of different questions and soforth... it's not included in the data set. Here, the information gleaned in the earlier trial study is expressed only through the choice of 120 people. The "fixed deterministic" rule "sample 120 people" is informative for a bigger model in which you include the earlier steps.

Or a slightly more sophisticated version: "we sampled until our Bayesian model for the probability of parameter q being in the range 0-0.1 was less than 0.01, ie. p(q < 0.1) < 0.01" To the people running the study, a Bayesian posterior is a deterministic function of the data and the model. Everyone who has the same model always calculates the same posterior from the given data. But note, *you* don't know what their Bayesian model was, either priors or likelihood.

Deterministic vs Random stopping rule vs Cox/Jaynes restatement

In the literature, a stopping rule is called "informative" if it is "a random stopping rule that is probabilistically dependent on a parameter of interest". I personally think this is a terrible definition, because "random" is usually a meaningless word. It gave me a lot of trouble in my conversation with Carlos, because when I think random I pretty much exclusively use that in the context of generated with a random number generator... but that's not what is meant here. So let's rephrase it in Cox/Jaynes probability terminology.

A stopping rule is informative to you if given your background knowledge, and the data up to the stopping point, you can not determine with certainty that the experiment would stop, and there is some parameter in your model which would cause you to assign different probabilities to stopping at this point for different values of that parameter.

Under this scenario, the *fact of stopping* is itself data (since it can't be inferred with 100% accuracy just from the other data).

Now in particular, notice that the informativeness of a stopping rule depends on the background data. This should be no surprise. To a person who doesn't know the seed of a random number generator runif(1) returns a "random" number, whereas to a person who knows the seed, the entire future stream of calls to runif is known with 100% certainty. It's best to not use the term random and replace it with "uncertain" or better yet "uncertain given the knowledge we have".

What can you usually infer from the stopping rule?

If you're in a situation where the stopping rule is uncertain to you, then if you believe it is helpful for your modeling purposes, you can add into your model of the data a model for the stopping rule (or a model for the choice of the "fixed" sample size). This is particularly of interest in real world rules along the lines of "my biologist friends told me what I should be getting from this experiment so I tried about 8 experiments and they kind of seemed to be consistent with what I was told, so I stopped". The actual rule for whether the experimenter would stop is very nebulous, but the fact of stopping might tell you something you could use to describe distributions over relevant real-world parameters. For example, suppose there's an issue with the way a drug is delivered that can cause toxicity if you do it "wrong". The fact that the biologist stopped after 8 experiments suggests that they believe p(DetectToxicity | DoingItWrong) is near 1, so that if you haven't seen it in 8 tries then you are virtually certain you are doing it "right".

So, eliciting information about the stopping rule is very useful because it can show you that there are potentially parameters you need to include in your model for which the fact of stopping informs those parameters, and particularly, parameters *that describe the nebulous uncertain rule*.

In the example above about sampling until a Bayesian posterior distribution excluded the range 0-0.1 with 99% probability, if someone tells you exactly the description of the Bayesian model, then if you plug in the data, you will immediately know whether the rule said stop or continue. But, if you know what the likelihood function was, but not the prior, then you could potentially infer something about the prior that was used from the fact that the sampling stopped at this point. If you think that prior was based on real background information, this lets you infer some of that real background info.


To a Cox/Jaynes Bayesian there is no such thing as "random" only "uncertain given the background information". A stopping rule can teach you something about your parameters precisely when your model doesn't predict the fact of stopping with probability = 1 *and* your model has a parameter which affects the probability of stopping conditional on data in other words, p(STOP_HERE | Data_so_Far, Background, Params) is a non-constant function of Params

Then, given the data, the fact that the experiment stopped is additional information about some of the Params

Stopping rules are not irrelevant to Bayesian models, but they are only relevant in those circumstances, if you feel that the stopping rule seems vague or the reasons for the choice of the sample size seem based on some information that you're not privy to, then you might want to invent some parameters that might help you explain the connection between the fact of stopping, and the things you want to know such as "hidden" prior probabilities in use by the experimenter that inform their stopping.


One Response leave one →
  1. Daniel Lakeland
    June 24, 2017

    The thing that becomes obvious in the restatement is that the fact of stopping is data just like any other data. And its informative precisely when the data is not redundant with other knowledge and whatever your model is predicts different probabilities for that data for different regions of parameter space.

    Your model might even predict 100% probability to stop for some regions of parameter space. And then the stopping rule is a deterministic given the parameter value but perhaps only 10% given a different value. There's no mysterious random powers inherent in the rule, there's only stuff you know for certain and stuff you're uncertain about.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS