> This approach leaves much to be desired.  I hope that its practitioners 
> start gauging it by the mean squared error of predicted probabilities.

Is the logic here is that low MSE of predicted probabilities equals a 
better calibrated model? What about discrimination? Perfect calibration 
implies perfect discrimination, but I often find that you can have two 
competing models, the first with higher discrimination (AUC) and worse 
calibration, and the the second the other way round. Which one is the 
better model?

