[R] Scaling - does it get any better results than not scaling?

Tue Jul 17 18:29:11 CEST 2018

This is a variant of FAQ 7.31 on rounding.

For hand arithmetic, for example the variance of c(29,30,31), it was
easier to subtract the mean and work with c(-1,0,1).
For limited precision computers working directly with many-digit
numbers could lead to rounding in intermediate steps and catastrophic
cancellation.

For more information see FAQ 7.31 in file
system.file("../../doc/FAQ")
on your computer.  Open in your favorite text editor.

Here is a simple example using 5-bit arithmetic (rather than the R
standard double precision with 53 bits)  that shows catastrophic
cancellation.

library(Rmpfr)

NN <- 29:31
NN
NN^2
formatBin(NN)
formatBin(NN^2)

## 53 bit precision (double precision)
SSq <- NN[1]^2 +NN[2]^2 + NN[3]^2
SSq
CorrSSq <- SSq - ((NN[1]+NN[2]+NN[3])^2)/3
CorrSSq ## right answer
formatBin(CorrSSq)

## 5 bit precision
ONE <- mpfr(1, precBits=5)
NNO <- NN*ONE
NNO
NNO^2 ## note loss of precision
formatBin(NNO) ## 5-bit numbers.  Their squares require 10 bits.
formatBin(NNO^2) ## 10-bit squares rounded to 5 bits

SSqO <- NNO[1]^2 +NNO[2]^2 + NNO[3]^2
SSqO
CorrSSqO <- SSqO - ((NNO[1]+NNO[2]+NNO[3])^2)/3
CorrSSqO ## very wrong answer from catastrophic cancellation
formatBin(CorrSSqO)

## "normalizing" NNO  5 bit precision
NNOm30 <- NNO-30
NNOm30
NNOm30^2
SSqOm30 <- NNOm30[1]^2 +NNOm30[2]^2 + NNOm30[3]^2  ## 5 bit precision
SSqOm30 ## right answer, even with low-precision arithmetic
formatBin(SSqOm30)

formatBin(NNOm30)
formatBin(NNOm30^2)

On Tue, Jul 17, 2018 at 12:53 AM, Michael Thompson
<michael.thompson using manukau.ac.nz> wrote:
> Hi,
> I seem to remember from classes that one effect of scaling / standardising data was to get better results in any analysis. But what I'm seeing when I study various explanations on scaling is that we get exactly the same results, just that when we look at standardised data it's easier to see proportionate effects.
> This is all very well for the data scientist to further investigate, but from a practical point of view, (especially IF it doesn't improve the accuracy of the result) surely it adds complication to 'telling the story'
> of the model to non-DS people?
> So, is scaling a technique for the DS to use to find effects, while eventually delivering a non-scaled version to the users?
> I'd like to be able to give the true story to my students, not some fairy story based on my misunderstanding. Hope you can help with this.
> Michael
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.