[R] Scaling - does it get any better results than not scaling?

Roger Koenker rkoenker @end|ng |rom ||||no|@@edu
Tue Jul 17 10:02:15 CEST 2018


In certain fields this sort of standardization has become customary based on some sort of (misguided) notion that it
induces “normality.”  For example, in anthropometric studies based on the international Demographic and Health
Surveys (DHS) childrens’ heights are often transformed to Z-scores prior to subsequent analysis under the dubious
presumption that variability around the Z-scores at various ages will be Gaussian.  In my experience this is rarely
justified, and analysts would be better off modeling the original data rather than doing the preliminary transformation.
This is discussed in further detail here:  https://projecteuclid.org/euclid.bjps/1313973394.

> On Jul 17, 2018, at 5:53 AM, Michael Thompson <michael.thompson using manukau.ac.nz> wrote:
> 
> Hi,
> I seem to remember from classes that one effect of scaling / standardising data was to get better results in any analysis. But what I'm seeing when I study various explanations on scaling is that we get exactly the same results, just that when we look at standardised data it's easier to see proportionate effects.
> This is all very well for the data scientist to further investigate, but from a practical point of view, (especially IF it doesn't improve the accuracy of the result) surely it adds complication to 'telling the story'
> of the model to non-DS people?
> So, is scaling a technique for the DS to use to find effects, while eventually delivering a non-scaled version to the users?
> I'd like to be able to give the true story to my students, not some fairy story based on my misunderstanding. Hope you can help with this.
> Michael
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list