Thoughts on MaxEnt distributions and demographics and computation
When working with social data it's pretty frequent that you get binned information. Like for example some survey might tell you the quartiles or deciles of the age distribution for males in the US, or the percentage of people whose age is between certain fixed values. If you're lucky you might also find out something about the people within each quartile (such as the mean age).
Suppose you'd like to do some prediction of some quantity based on this kind of information. For example, you might have percentage of people with a given educational attainment born before each year, as a graph, and you might have population quartiles of the current population, and you have some predictive equation for say income based on educational attainment and age, and you'd like to calculate the average income for males between age 20 and 45 today and for males between 20 and 45 years of age 20 years ago.
This is a made up example, but typical of the kind of thing I'm thinking of. In particular, you might like to do something like take panel data and infer trajectories through time for individuals, even though you don't have repeated measures. So for example you might generate virtual people born in 1940 and then have them go through earnings trajectories which put together replicate the panel data in 1960, 1970, 1980, 1990, 2000 and estimate something like what the distribution of household wealth would have been if some kind of policy were different (and you have say some simple causal model for what the savings would have been if the policy were different).
The answer of course is that you use maximum entropy. But the maximum entropy distribution of interest is a complicated one, and you might like to do numerical maximization of the entropy.
If you want to do something like this in Stan, where you're simultaneously doing inference on parameters using Bayesian methods, and finding the parameters for a distribution that maximize some measure of entropy for some prior... how do you go about it? I don't think there is an easy answer. It might be good to come up with a more simple and tractable example problem. So for example, suppose you know that the quartiles of age in some population are 23, 44, and 70 years of age, that the average age between 0 and 23 is 9, between 23 and 44 is 31, between 44 and 70 is 62, and over 70 is 81.
Suppose also that we have some function Q(x), and we want, in Stan, to approximately identify the gaussian mixture model with 3 components (8 degrees of freedom, the mean and SD of each mixture component and the weights of the mixture) that maximizes entropy subject to those 7 constraints, and calculate in the generated quantities the mean value of Q(x) from a sample of 1000 points drawn from the maxent distribution. I'll even add in some leeway as if these numbers above are rounded off, so your maxent only needs to satisfy the constraints to within +- 0.5% and +- 0.5 years.