[Rd] scale(x, center=FALSE) (PR#14219)

Ben Bolker bolker at ufl.edu
Fri Mar 12 19:29:44 CET 2010

  I'm resending this after a week ... I really don't want to nag, but
I also would not like to see this sink below the waves.

  Is there a preferred protocol for requesting comments without nagging
too much?   I would add a comment to 14219 (and was curious to see
whether it was rejected) ... I went to bugzilla, and bug 14219 doesn't
seem to exist any more -- either as open or as closed -- don't know if
it got lost, or thrown away, when the bug system migrated?

     Ben Bolker

 [re: behavior of scale() when center=FALSE and scale=TRUE]

>   Again, I agree with you that the behavior is not optimal, but it is
> very hard to make changes in R when the behavior is sub-optimal rather
> than actually wrong (by some definition).  R-core is very conservative
> about changes that break backward compatibility; I would like it if they
> chose to change the function to use standard deviation rather than
> root-mean-square, but I doubt it will happen (and it would break things
> for any users who are relying on the current definition).


>  I have attached a patch
> file (and append the information below as well) that changes "standard
> deviation" back to "root mean square" and is much more explicit about
> this issue ... I hope R-core will jump in, critique it, and possibly use
> it in some form to improve (?) the documentation ...
>   [PS: I have written that the scaling is equivalent to sd() "if and
> only if" centering was done.  Technically it would also be equivalent if
> the column already had zero mean ...]
--- scale.Rd	(revision 51180)
+++ scale.Rd	(working copy)
@@ -41,13 +41,18 @@
   equal to the number of columns of \code{x}, then each column of
   \code{x} is divided by the corresponding value from \code{scale}.  If
   \code{scale} is \code{TRUE} then scaling is done by dividing the
-  (centered) columns of \code{x} by their standard deviations, and if
+  (centered) columns of \code{x} by their root-mean-squares, and if
   \code{scale} is \code{FALSE}, no scaling is done.
-  The standard deviation for a column is obtained by computing the
-  square-root of the sum-of-squares of the non-missing values in the
-  column divided by the number of non-missing values minus one (whether
-  or not centering was done).
+  The root-mean-square for a (possibly centered)
+  column is defined as
+  \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))},
+  where \eqn{x} is a vector of the non-missing values
+  and \eqn{n} is the number of non-missing values.
+  If (and only if) centering was done,
+  this is equivalent to \code{sd(x,na.rm=TRUE)}.
+  (To scale by the standard deviations without centering,
+  use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
   Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)

 (Bump re: suggested update to scale.Rd .  Is this under
consideration? I'll stop pestering if it's considered
unacceptable, just don't want it to vanish without a trace ...)

Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / people.biology.ufl.edu/bolker
GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20100312/cae8fd56/attachment.bin>

More information about the R-devel mailing list