Professor Flannery at Georgia Tech has solved the problem of reconciling Gauss' principle, which is directly applicable to non-holonomic systems, with the Lagrange-D'Alembert principle whose application to non-holonomic systems is tricky. There are sort of two versions of the paper, the full version and the easier to read simpler version.

I've just skimmed each of these, but it seems that the key bit is the traditional assumption that $\delta \dot q = \frac{d}{dt}(\delta q)$ where $\delta q$ represents a perturbation of the coordinate path of a system, and $\delta \dot q$ represents a perturbation of the velocities implied by the perturbed coordinate path.  In other words, whether we perturb a path and then take a time derivative to get the velocity, or we perturb the velocity directly, we should wind up with the same perturbation. This does not obviously need to hold, but it is assumed in the traditional development.

Instead, Flannery proposes that for a system with a general velocity constraint $g(q,\dot q,...)=0$ the relationship $\delta g - \frac{d}{dt}[\frac{\partial g}{\partial \dot q} \delta q]=0$ is the one that needs to hold.

In other words, there are subtleties, but he has worked out these subtleties and showed that they produce equations equivalent to those gotten from Gauss' principle. I need some time to digest these papers, but they are a welcome advance, because previously I had only read papers that suggested the same equations of motion that Flannery derives should hold for all nonlinear nonholonomic constraints but did not effectively justify why.