What I can imagine is that people take what is essentially a continuous thing with a very rapid change, and approximate it as a step function, here, undoing the approximation leads to a more real but less convenient to represent function.

Think of it this way, take any discontinuous loss function at all that people might actually use, which is expressed on dimensionless variables that are O(1), and convolve it with normal(0,1e-37). It is now an infinitely smooth function. Does it represent the real-world badness less well?

Even if you say something like there is a computer program which decides whether you get some good thing or not, and it has a precise step function in it... so that the convolved version really is problematic right at the transition (say between "you get 0" and "you win the lottery"), the convolved version is just a device inside an integral that allows you to get a decision, so does it lead to a meaningfully different decision? I can't see it. The integral already gives the transition region basically infinitesimal weight, having that infinitesimal weight spread out by 1e-37 in any real world problem seems unlikely to matter. And if it does matter, what the heck kind of decision is that?

]]>The uniqueness thing, I admit I don't know, because I don't know what a "space of abstract decision rules" looks like. The space of bayesian rules is more easy to understand (it involves basically just choosing priors).

The boundedness of the parameter space can be forced by transforming the region [0,1] through a function that goes to -inf at 0 and +inf at 1.

The compactness of the decision rule space... in real practical terms, let's just allow only decision rules that can be programmed into a computer in less than 10 million bytes of computer code in the language of your choice. If your reason for not using Bayes is that you want to sit down and write a 300 million line computer function that is your decision rule... you probably need a punch in the nose.

]]>There are basically 3 ways to do stuff:

1) Do some stuff, just whatever works for you. I can't comment on this really, but lots of stuff is done this way.

2) Come up with a principled way to choose what stuff to do: there are basically two *principled* views of statistics, Frequentist principles, and Bayesian principles. Then just do whatever someone told you.

3) Come up with some principles, and actually follow them.

Almost all of what I've seen done in the "based on Frequentist principles" camp is done by people who are somewhere in the camp (2) case, they simply do what they were taught the principles were in their textbook. They're like the person in the bed who gets hit by the truck... not really morally involved in any of this, they just did the best they knew based on what they were taught in the textbooks... Note that *lots* of actual Civil engineers design bridges and things just following the design specifications in the code or the textbook or whatever...

On the other hand, where the principles ought to be... namely in the textbook... the principles that should be taught are that you should analyze a problem in a way that produces least average badness for society or the like. you have choices about how to do stuff!! They are explicit, and they have moral content.

So, please pull out the big pile of textbooks on standard statistical methods that describe in detail early on, the principles required to get least average badness given your uncertainty?

On my shelf I have various things I bought back in the day: Heilberger and Holland, Venables and Ripley, Fox Applied Regression...

Heilberger and Holland seems like a bog standard Masters degree stats text. I got it because it was basically the only thing Codys had (a venerable high quality book store on Telegraph Ave in Berkeley, closed about 15 years ago but used to be THE source for academics in the bay area)

Here's what it says in the chapter "Statistics Concepts" under the heading "Estimation: Criteria for Point Estimators"

There are a number of criteria for what constitutes "good" point estimators. Here is a heuristic description of some of these.

Unbiasedness....

small variance....

consistency....

sufficiency...

where "..." elides a description of what those mean

And I'm sorry, but that seems to me to be morally outrageous considering that tons of people will go off to become biostats masters grads and then start running clinical trials on things that actually have the potential to kill people, like cancer drugs or whatnot.

Pretty much EVERYTHING you find from there on will be examples of plugging and chugging to find ML estimates or least squares, or do various hypothesis tests or whatever. Not ONCE will you find an example problem that looks like:

Dr. Margulin knows that his patient is dying of an overdose of a certain drug. Dr Margulin knows the patient's height, weight, age, sex and which drug the patient took, but has only a very noisy estimate of the dose that the patient took. If Dr Margulin gives not enough of the antidote the patient will die in the next few hours, if he gives too much of the antidote the patients kidneys and liver will fail and he will die a after a few days. Fortunately previous studies of this drug in monkeys have given us a lot of data on dosing. We can approximate the badness of the outcome by B(Dose/PatientMass,AntidoteDose/PatientMass) a given function. Given the dataset below, how do you design a chart that will inform Dr Margulin of the best dose to give to his patient?

Yet it seems to me this or something like it should be a textbook problem in every stats text. You simply can't do principled statistics from a Frequentist perspective by sticking to mean squared error and maximum likelihood.... without committing accidental atrocities.

Mean Squared Error and Maximum Likelihood are taught *as if they were the meaningful principles* but the meaningful principle is "choose something that minimizes/maximizes something else"

]]>