[R] Scaling - does it get any better results than not scaling?

Wed Jul 18 07:36:02 CEST 2018

My thanks to all contributors, and while I was not in the right place, I certainly got the answers I needed. My students will benefit, so thank you all.

Regards,
Michael Thompson M.Prof.Studies Data Science
09 975 4678
Senior Lecturer, Digital Technologies
Manukau Campus
We all, like sheep, have gone astray Isaiah 53
Personal profile: https://www.manukau.ac.nz/about/faculties-schools/business-and-information-technology/more-information-for-students/lecturer-profiles/michael-thompson

From: Bert Gunter [mailto:bgunter.4567 using gmail.com]
Sent: Wednesday, 18 July 2018 3:02 AM
To: Roger Koenker <rkoenker using illinois.edu>
Cc: Michael Thompson <michael.thompson using manukau.ac.nz>; r-help using r-project.org
Subject: Re: [R] Scaling - does it get any better results than not scaling?

Prof. Koenker's response probably settles the matter, but if not, this thread should really be taken offlist, as it is primarily about statistics and not R programming.
stats.stackexchange.com<http://stats.stackexchange.com> might be an alternative place to post; indeed, I suspect the issue has already been addressed in their archives.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Jul 17, 2018 at 1:02 AM, Roger Koenker <rkoenker using illinois.edu<mailto:rkoenker using illinois.edu>> wrote:
In certain fields this sort of standardization has become customary based on some sort of (misguided) notion that it
induces “normality.”  For example, in anthropometric studies based on the international Demographic and Health
Surveys (DHS) childrens’ heights are often transformed to Z-scores prior to subsequent analysis under the dubious
presumption that variability around the Z-scores at various ages will be Gaussian.  In my experience this is rarely
justified, and analysts would be better off modeling the original data rather than doing the preliminary transformation.
This is discussed in further detail here:  https://projecteuclid.org/euclid.bjps/1313973394.

> On Jul 17, 2018, at 5:53 AM, Michael Thompson <michael.thompson using manukau.ac.nz<mailto:michael.thompson using manukau.ac.nz>> wrote:
>
> Hi,
> I seem to remember from classes that one effect of scaling / standardising data was to get better results in any analysis. But what I'm seeing when I study various explanations on scaling is that we get exactly the same results, just that when we look at standardised data it's easier to see proportionate effects.
> This is all very well for the data scientist to further investigate, but from a practical point of view, (especially IF it doesn't improve the accuracy of the result) surely it adds complication to 'telling the story'
> of the model to non-DS people?
> So, is scaling a technique for the DS to use to find effects, while eventually delivering a non-scaled version to the users?
> I'd like to be able to give the true story to my students, not some fairy story based on my misunderstanding. Hope you can help with this.
> Michael
>
> ______________________________________________
> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]