Thinking about gamma priors
Suppose you've got a parameter that's a positive number whose order of magnitude you know. For example, my height, you don't know what it is, but if I asked you if it was negative you'd say with 100% logical certainty that it wasn't, and if I asked you if it was 3 inches you'd be damn sure it wasn't, and if I asked you if it was 5 ft you might think that's reasonable, and 20 feet is certainly unreasonable... A typical average height for an adult male is something like 5ft 10 inches (178 cm). So it would be safe to say something like
h ~ exponential(1.0/178)
for an adult male height in cm. But, the peak density of this distribution is at h=0 and it extends well out into the 3 to 4 times the average height. That seems problematic.
Here's where the gamma distribution comes in. The average of a gamma(k,1/x) random variable is kx. And in fact, if a and b are both exponential(1/x) then the sum is gamma(2,1/x) and for n exponential variables, gamma(n,1/x) for integer values of n. But n doesn't need to be an integer. The shape parameter n is continuous and is a parameter that "acts like" a sample size. That is, as n increases the gamma distribution holding the average value constant, so gamma(n,n/x) becomes more delta function like around the average value x.
If all you know is a variable is positive and has a typical value, you can use exponential(1/x) as a maximum entropy distribution for a given average x. If you know that the distribution is more concentrated near the average (and away from both zero and infinity) then you can use gamma(n,n/x) as a continuously "less than maximum entropy" distribution where "n" is an "effective sample size" and so can help you understand how to choose n. So if you think you have about as much information about my height as if you'd found say 12 randomly selected adult males from the US, you can use gamma(12,12.0/178) as a prior for my height.
What does that distribution look like? It has 95% interval 92 to 292 cm (3 to 9.6 ft) which is probably pretty reasonable for a height considering that I could for all you know be a dwarf or a basketball player...
The gamma distribution is in fact a maximum entropy distribution. Specifically, based on the explanation on wikipedia:
The gamma distribution is the maximum entropy probability distribution for a random variable X for which E[X] is fixed and greater than zero, and E[ln(X)] is fixed. [edited for clarity]
That is, you know what the average is, and you know what the average logarithm is (think of the average logarithm as a limit on the order of magnitude).