[Rd] scale(x, center=FALSE) (PR#14219)
Ben Bolker
bolker at ufl.edu
Fri Mar 5 00:06:16 CET 2010
Ben Bolker <bolker <at> ufl.edu> writes:
[re: behavior of scale() when center=FALSE and scale=TRUE]
> Again, I agree with you that the behavior is not optimal, but it is
> very hard to make changes in R when the behavior is suboptimal rather
> than actually wrong (by some definition). Rcore is very conservative
> about changes that break backward compatibility; I would like it if they
> chose to change the function to use standard deviation rather than
> rootmeansquare, but I doubt it will happen (and it would break things
> for any users who are relying on the current definition).
[snip]
> I have attached a patch
> file (and append the information below as well) that changes "standard
> deviation" back to "root mean square" and is much more explicit about
> this issue ... I hope Rcore will jump in, critique it, and possibly use
> it in some form to improve (?) the documentation ...
>
> [PS: I have written that the scaling is equivalent to sd() "if and
> only if" centering was done. Technically it would also be equivalent if
> the column already had zero mean ...]
>
===================================================================
 scale.Rd (revision 51180)
+++ scale.Rd (working copy)
@@ 41,13 +41,18 @@
equal to the number of columns of \code{x}, then each column of
\code{x} is divided by the corresponding value from \code{scale}. If
\code{scale} is \code{TRUE} then scaling is done by dividing the
 (centered) columns of \code{x} by their standard deviations, and if
+ (centered) columns of \code{x} by their rootmeansquares, and if
\code{scale} is \code{FALSE}, no scaling is done.

 The standard deviation for a column is obtained by computing the
 squareroot of the sumofsquares of the nonmissing values in the
 column divided by the number of nonmissing values minus one (whether
 or not centering was done).
+
+ The rootmeansquare for a (possibly centered)
+ column is defined as
+ \eqn{\sqrt{\sum(x^2)/(n1)}}{sqrt(sum(x^2)/(n1))},
+ where \eqn{x} is a vector of the nonmissing values
+ and \eqn{n} is the number of nonmissing values.
+ If (and only if) centering was done,
+ this is equivalent to \code{sd(x,na.rm=TRUE)}.
+ (To scale by the standard deviations without centering,
+ use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
}
\references{
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
(Bump re: suggested update to scale.Rd . Is this under
consideration? I'll stop pestering if it's considered
unacceptable, just don't want it to vanish without a trace ...)
More information about the Rdevel
mailing list