abs(x-y) < 0.001 and abs(f(x) - f(y)) < 0.001 Once you decided which you think is the important "rule" you can keep things invariant by defining a transformation property of the delta function... But until you come up with a principled decision you have this indeterminate ratio to contend with which is essentially that if abs(x-y) < 0.001 then it implies that abs(f(x)/f'(x)-f(y)/f'(y)) < .001 which since x,y are also close essentially says more or less that abs(f(x)-f(y)) < f'(x) * 0.001 and since f is a free variable, you can make f'(x) any number at all, which is morally the same thing as saying ds1/ds2 is any positive quantity.

]]>P(R)Product{P(R_i|R)}

The rest of the model consists of the conditional distributions that ensure that for a given i and marginal of the rest of the variables, Xtrue_i and Ytrue_i are independent in the prior.

P(R)Product{P(R_i|R)P(Ytrue_i | R_i)P(X_i, Y_i | R_i, Ytrue_i)}

(I don't think having Xtrue_i and Ytrue_i independent in the prior is necessarily desirable for Bayesians or anything -- it's just the condition that ensures that credible regions (marginal of the other variables) are also exact confidence regions and I want to see what enforcing that condition does for the confidence coverage of the credible regions of R (when likelihoods and priors aren't being approximated and mangled). In one sense the indeterminacy of the prior is an advantage because one could tune it or even make it data-dependent to get good coverage -- presumably that's what a confidence distribution theorist would do -- but I want to be as Jaynesian as possible here.)

]]>P(R | R1...RN) Product(P(Ri|Yi)P(Yi|Datai))

Looks well constructed at first glance

]]>It's worth noting that none of this depends on the definition of R as the ratio of normals. You get into very similar issues with a difference of normal means as the estimand of interest:

Delta = Xtrue - Ytrue = Ptrue - Qtrue

instead of

R = Xtrue/Ytrue = Ptrue Qtrue

where again Xtrue and Ytrue are prior-independent marginal of the other variables and Ptrue and Qtrue are prior-independent marginal of the other variables by construction. That is,

p(delta, y_true, q_true) = p(delta) p(y_true | delta) p(q_true | delta)

and there's only enough marginal Delta to cancel out the delta-dependence of one of the two conditional priors.

]]>Then we can define R_i the ratio for each experiment. We can say that by doing this experiment in multiple ways at multiple labs, we can at least on average across the labs produce something that has net zero bias, and do something like

R_i ~ normal(R_ultimate,sigmaBias)

where R_ultimate is the real thing we're doing inference on. Then, without any logical difficulties involving defining multiplication on generalized distributions, and/or free parameters that are the limited ratio of two infinitesimals...

p(R_i | y_i) p(y_i) = p_xi(R_i*y_i) p(y_i)

and all the R_i pool through the hyperparameters R_ultimate and sigmabias

and I think we're going to get a well posed problem, where the role of the "slack" in the previous version that seems mysterious is now the role played by the distribution over the collection of R values that you get through acknowledging the small biases introduced by doing the experiment in two different labs with different apparatus and different data collectors and different times and different measurement instruments and whatnot.

]]>integrate(p(x=Ry,p=Rq | y,q)p(y|D1)p(q|D2)dydq) = p(R|D1,D2)

= integrate(Delta(x/y-R)Delta(p/q-R)p(y|D1)p(q|D2)dydq)

now, in general it's not possible to define multiplication on generalized distributions such as the delta function. But, you can use IST or what's the same, some particular limiting process to define this.

Define the nonstandard family of pre-delta functions as normal_pdf(x/y-R,ds1) where ds1 is an infinitesimal standard deviation and normal_pdf(p/q-R,ds2) for infinitesimal ds2.

Now, basically we can see that the result we get depends on the relative size of the two slacks or the ratio ds1/ds2 which is a free parameter and can be any positive number. In other words, how much relative precision does experiment D1 give us relative to D2. We need to weight the evidence of these two experiments.

Of course we might get a different result if we use a different pre-delta function, but the essential fact is not that we could use gaussians vs uniforms vs triangle distributions vs whatever, but instead that we have this totally free parameter ds1/ds2 that we need to specify

]]>