[R] standardize columns selectively within a dataframe

Olga Lyashevska olga at herenstraat.nl
Wed Sep 1 18:59:40 CEST 2010


On Wed, 2010-09-01 at 12:42 -0400, David Winsemius wrote:

> I suspect you might have tried (df-mean(df))/sd(x) and gotten  
> unsatisfactory results; I know I did. 

yes, indeed! a few times, but why is that?

> If you had really wanted to  
> persist and do it from first principles, so to speak, or perhaps as  
> "homework", then consider the sweep operation. It takes an object of  
> lower dimension and applies a function, ("-") by default, with the  
> third argument repeatedly across the specified (in the second  
> argument) dimension. You wanted to work on columns, so this would  
> accomplish the subtraction of means() followed by division by sd():
> 
>  > sweep(as.matrix(df[ , 1:2]), 2L, colMeans(mm)) # using the default  
> "-" operator
>        a  b
> [1,] -1 -1
> [2,]  0  0
> [3,]  1  1
>  > sweep(sweep(df[ , 1:2], 2L, colMeans(mm)), 2, sd(mm), "/")
>     a  b
> 1 -1 -1
> 2  0  0
> 3  1  1

I am glad you are talking about sweep here, I have been also trying to
use it, but never managed to get complete understanding of what it
exactly does and therefore I could not get it working properly. Very
clear explanation, thanks!   

> (Your test columns happened to be scaled already and only needed to be  
> centered. This is how scale() does its work, and their help pages have  
> links cross-referencing each other.)
> 
> This is probably a good time to reference Burns', The R Inferno, which  
> has an entry for sweep (p 57) as well tips regarding the drop=FALSE  
> maneuver (p 54) that I tried first for this problem but it "didn't  
> work".

Thanks for the references! Your solution with scale() is nice and neat,
but for the sake of learning it is useful to persist.  

Cheers,
Olga



More information about the R-help mailing list