# Inference on a ratio from two separate experiments (a case study in Stan awesomeness)

Andrew Gelman's blog has a post about a case where you do two experiments, one you measure X,Y with measurement error, and want to find out the true ratio of the parameters Xbar/Ybar, and same for a second experiment with Pbar/Qbar. Here is the R code to do the complete experiment:

note that we want to do p(R | Xtrue,Ytrue,Ptrue,Qtrue) = p(R|Xtrue,Ytrue) P(R|Ptrue,Qtrue) and each of these is a delta function (that is, if you tell me Xtrue,Ytrue I find out R exactly)... so we have to approximate this delta function with a sufficiently tightly peaked normal distribution. Basically we tell Stan to stay in the region where both experiments agree to within epsilon = 0.001

## inference on a ratio via two separate experiments library(rstan); dataset <- data.frame(X=rnorm(10,1,.1),Y=rnorm(10,2,.3),P=rnorm(10,30,6),Q=rnorm(10,60,4)); ## now the true value of R = 1/2 = 30/60 but we have two experiments ## where we measure X,Y or we measure P,Q with measurement error that ## is of known size (we know the .1 and .3 and 6 and 4 used as ## standard deviations, assume we have a well calibrated instrument). stancode = "data { vector[10] X; vector[10] Y; vector[10] P;vector[10] Q;} parameters{ real<lower=0> Xtrue; real<lower=0> Ytrue; real<lower=0> Ptrue; real<lower=0> Qtrue; real<lower=0> Rat; } model{ // we assume on physical principals that these are positive and have an upper bound order of magnitude estimate Xtrue ~ uniform(0,1000); Ytrue ~ uniform(0,1000); Ptrue ~ uniform(0,10000); Qtrue ~ uniform(0,10000); X ~ normal(Xtrue,.1); Y ~ normal(Ytrue,.3); P ~ normal(Ptrue,6); Q ~ normal(Qtrue,4); // constrain the ratios to agree to within a small epsilon Rat ~ normal(Xtrue/Ytrue,.001); Rat ~ normal(Ptrue/Qtrue,.001); } " samps <- stan(model_code=stancode,data=dataset); samps

When I run it. I get the following output:

Inference for Stan model: 5100ead2741217b889326da6ff0f0419. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat Xtrue 1.02 0.00 0.03 0.97 1.01 1.02 1.04 1.08 2105 1 Ytrue 2.06 0.00 0.08 1.91 2.00 2.06 2.11 2.22 1568 1 Ptrue 30.24 0.04 1.31 27.71 29.34 30.23 31.15 32.87 1410 1 Qtrue 60.64 0.03 1.22 58.20 59.85 60.66 61.43 63.06 2172 1 Rat 0.50 0.00 0.02 0.46 0.48 0.50 0.51 0.54 927 1 lp__ -6.28 0.04 1.59 -10.39 -7.05 -5.95 -5.11 -4.21 1588 1 Samples were drawn using NUTS(diag_e) at Sun Apr 9 19:37:53 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). >

Note that with 10 observations in each experiment, we determine that our expected value of Rat is 0.5 to within 0.00 standard error, and almost all the samples of Rat are between about 0.46 and 0.54

How about:

parameters{

real logYtrue;

real logQtrue;

real logRat;

}

transformed parameters {

real Ytrue = exp(logYtrue)

real Xtrue = exp(logYtrue + logRat);

real Qtrue = exp(logQtrue)

real Ptrue = exp(logQtrue + logRat);

}

Yes, then you drop the normal approximate constraints Rat ~ normal(... , 0.001) and it's exact right? Good old algebra.

There's a folk theorem here somewhere: "wherever you want a delta function, you can replace it with some clever algebra"

I think you have to be careful about the implied priors. If you have logYtrue and soforth as your parameters, and you don't declare explicit priors, it implies uniform priors on the whole log space, which then implies really weird divergent priors on Ytrue, Qtrue, and Rat space.

I don't think the log transform is really required, you can just do multiplication right?

parameters{

real Ytrue;

real Qtrue;

real Rat;

}

transformed parameters{

real Xtrue;

real Ptrue;

Xtrue = Rat * Ytrue;

Ptrue = Rat * Qtrue;

}

model{

// priors on Ytrue,Qtrue,Rat go here

X ~ normal(Xtrue, foo);

Y ~ normal(Ytrue,bar);

P ~ normal(Ptrue,baz);

Q ~ normal(Qtrue,quux);

}