# A Generative Modeling Challenge

So, I've been discussing declarative vs generative models. Here's a case where I think the declarative model makes good sense, how would you model this?

y is a vector of 1000 timeseries samples from some audio system. On one end is a highly sensitive microphone that puts out a voltage signal proportional to the instantaneous pressure gradient. The voltage is in the range [0,0.01] volts and is known to have a slew-rate limit of at most 0.005 v per sample and a bias of 0.005 +- 0.0006. A pre-amplifier increases this signal by pretty close to 100x but has a response function which has a known shape over the input range [0,0.01], is not perfectly linear, and this shape is multiplied by a number A slightly less than 100 to get the actual transfer function. The pre-amp output picks up radio frequency noise, which can be modeled as normal(0,0.015), and the result is converted via 10 bit A/D into the range 0-1023, with 1v in to the A/D implying 1024. This digital signal is sent down a noisy channel, so that only some of the samples are received in digital form. Additionally, as a compromise to improve reliability and due to lack of equipment, the signal is re-converted to analog via a 12 bit D/A outputting in the range 0-10v, which has a known noisy bias signal due to a failing diode that can be modeled as independent exponentially distributed voltages with mean value close to 0.05. The result is sent over a long cable of around 1000 feet, where additional normally distributed white noise is added to the signal whose RMS amplitude is less than 0.01 v. The long cable has about 1 ohm resistance per 100 feet +- 5 % as well as a resistance of less than 10 ohms at both the input and output connections. The whole thing is put into a 16 bit A/D operating on inputs in the range 0-10v with an unfortunately low input impedance of close to 1000 ohms +- 1 %. The 16 bit A/D is sufficiently high resolution that y is given as a floating point number equal to the digital value on 0-(2^16-1) / 2^16 and assumed to be continuous with an additional conversion noise and bias which is approximately gamma distributed with k=3 and rate = 1/0.003

Reconstruct the original signal y_true out of the microphone as best you can using as much of the information available as possible, while also giving best estimates for the length of the wire, the total resistance of the cable, the amplification factor for the pre-amp, and the input impedance of the final A/D.

The "generative" picture here is this (more or less, we're ignoring low-pass filters that are always present on the d/a and a/d steps):

y_meas = round(((round(1024*A*f_pre(y_true)+epsilon_rf)+da_bias_noise)*10* ((rperlen*Len+conn_imp)/(rperlen*Len+inpt_imped+conn_imp))+cable_noise)/10*2^16)/2^16+ad16_noise

And, in addition, we have some of the digital values round(1024*A*f_pre(y_true)+epsilon_rf) let's call the ones we have y_digit, which is a sparse vector and t_digit which tells us which of the y_meas the y_digit correspond to.

So, if you wanted to write this in a generative form, with only data or only parameters on the left hand side of the probabilistic statements, in Stan, what would you use to define the likelihood function for y_meas or y_digit given all the parameters, y_true, epsilon_rf, da_bias_noise... etc?

For the full analog / digital / analog / digital / analog path, You could always invent some extra roundoff error parameters, and write

y_meas - (....) ~ gamma(3,1/0.003);

where (...) = the whole mess where round(x) is replace with x+roundoff_i and roundoff_i is given the appropriate prior. But then you'd be writing a declarative model with a nonlinear function of the data and the parameters (consider the resistances formula) on the left hand side.

Stan doesn't have a shifted gamma, but if it did, you could use the shifted gamma distribution and place the round function in the location parameter, however, you will probably induce bad behavior due to the non-differentiable jumps.

The point of the declarative version is that *if you were given the actual values of the parameters* other than ad16_noise, then subtracting y_meas - (...) would give you an expression equal to the noise sample, and the likelihood for seeing that sample conditional on all the other parameters, is the gamma distribution.

If you have the wrong values of the parameters, you won't get something that's gamma distributed, so you'll be in the low probability region of the gamma distribution function, so you'll downweight that portion of the parameter space... that's all you need to get inference on the parameters, that you find reasons to reject some of the prior space because it doesn't fit the facts you know.

Haven't you done all of the hard work of making a generative model already? I'm not very proficient in Stan, but can't you just do the following?

ad16_noise ~ gamma(3,1/0.003);

cable_noise ~ normal(0, cable_noise_rms);

cable_noise_rms ~ unif(0,0.01);

conn_imp ~ unif(0,10);

// and so on until...

y_meas = round(((round(1024*A*f_pre(y_true)+epsilon_rf)+da_bias_noise)*10* ((rperlen*Len+conn_imp)/(rperlen*Len+inpt_imped+conn_imp))+cable_noise)/10*2^16)/2^16+ad16_noise

I should clarify, obviously the generative picture is clear. This is more about how to actually implement the model in a generative way in Stan or another language.

The big issue, is the final step.

y_meas ~ ??????

there are two issues, the first one is that the "round" function is non-differentiable, so mucks with Stan's hamiltonian monte carlo pretty heavily, and the second is that it's only the difference between that big generative thingy and y_meas which is gamma distributed. There's no way in Stan to say:

y_meas = some_complicated_function of parameters.

you can only say y_meas ~ some_distribution_function

What is the distribution you should put on y_meas given that

y_meas - (...) ~ gamma(foo,bar);

which is the declarative statement I'd use in _my_ model

I guess the point of all of this is that forcing yourself into a mold where you're only ever putting either the name of a prior, or the name of a data variable on the left hand side of a ~ statement, so that you've expressed your model in a directly generative way (ie. where data is the output of a special RNG described on the right of the ~) is stifling.

Also I should point out that the slew rate limit was a trap too. it implies

y_act[2:1000] - y_act(1:999) ~ uniform(-0.005,0.005)

You can do this as an iteration, and the distribution of y_i then is windowed within the reachable range, and depends on y_i-1... but again, it's a huge stretch and obfuscation of what's going on compared to the above declarative statement.

I'll assume you know Stan's limitations better than me; in that context, this seems like a real advantage for the declarative approach. Bob Carpenter seems receptive and has AFAIK has written most of the users' manual, so I bet if you could convince him, he could convince AG.

In a more general context, it seems like it wouldn't be too hard to back out enough block-conditional posteriors from the generative model in closed (or closed-enough form that I could use block differential evolution MCMC to get a sampler with the right stationary distribution. (Efficiency would be a concern, naturally, but when is it not?)