Regular readers of my blog will know that I'm an IST nonstandard analysis fan. It fits so well with model building. So, I thought I'd clear up how I interpret "maximum likelihood". If you write down a likelihood for data D with parameter vector $q$ which we'll notate as $p(D|q)$ and you maximize it without putting any prior on $q$, clearly the maximum occurs at the same place independent of multiplication by any positive constant $C$.

Now, suppose for ease of exposition, that $q$ is just a single parameter. In nonstandard analysis, we can choose a nonstandard integer $N$ and create a prior for $q$ which is $p(q) = 1/(2N)$ for any q value in the range $[-N,N]$ and zero otherwise. Clearly this is a constant for all standard values of $q$.

Now, do the maximization of this new nonstandard posterior, and provided that your maximum likelihood method picks out a limited value for $q$ it picks out the same $q$ as your maximum likelihood method without this nonstandard prior. It's clear that multiplying by this prior couldn't have any effect on the maximum point.

Is this "legitimate?" Well let me ask you a totally equivalent question. Are integrals legitimate? Because the integral of $f(x)$ a continuous function over a region $[a,b]$ can be defined as follows, let $N$ be a nonstandard integer, and $dx = (b-a)/N$ and $\mathrm{st}(x)$ be the standard part function, then

and if you don't like the Riemann integral, you can do the Lesbesgue integral instead, in that case you need to evaluate f at standard locations:

Both of these ideas are mathematical constructs in which nonstandard numbers are used to define a mapping from a standard thing to another standard thing. So, maximum likelihood, when it gives a unique maximum, is just maximum a-posteriori for a nonstandard posterior with a nonstandard flat prior, the same way that the integral of a function is just the standard part of a nonstandard sum.

It's not like this isn't a known thing, that maximum likelihood is the same as maximizing the Bayesian posterior with a "flat" prior. But usually that's taken as a kind of "intuition" because there *is no* (standard) flat prior on the real line. Well, there *is no* standard $dx$ value either, but that doesn't keep us from using nonstandard $dx$ values to define an integral, and it doesn't keep us from using nonstandard priors to define a standard posterior either.

Of course, if you pick some likelihood that can't be normalized... then we're talking about a different story. You should probably rethink your model.