[R] standardize columns selectively within a dataframe

David Winsemius dwinsemius at comcast.net
Wed Sep 1 18:42:18 CEST 2010


On Sep 1, 2010, at 10:42 AM, David Winsemius wrote:

>
> On Sep 1, 2010, at 10:35 AM, Olga Lyashevska wrote:
>
>> Dear all,
>>
>> I have a dataframe:
>> df<-dataframe(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
>>
>> I want to obtain a new dataframe with columns a and b being  
>> standardized
>> ((x-mean(x))/sd(x)); the other two columns (c,d) I want to leave
>> unchanged. What is the best way to achieve this? I have been trying  
>> to
>> use subscripts but did not succeed so far.
>
> > df[ , 1:2] <- scale(df[ , 1:2])
> > df
>   a  b c  d
> 1 -1 -1 7 10
> 2  0  0 8 11
> 3  1  1 9 12

I suspect you might have tried (df-mean(df))/sd(x) and gotten  
unsatisfactory results; I know I did. If you had really wanted to  
persist and do it from first principles, so to speak, or perhaps as  
"homework", then consider the sweep operation. It takes an object of  
lower dimension and applies a function, ("-") by default, with the  
third argument repeatedly across the specified (in the second  
argument) dimension. You wanted to work on columns, so this would  
accomplish the subtraction of means() followed by division by sd():

 > sweep(as.matrix(df[ , 1:2]), 2L, colMeans(mm)) # using the default  
"-" operator
       a  b
[1,] -1 -1
[2,]  0  0
[3,]  1  1
 > sweep(sweep(df[ , 1:2], 2L, colMeans(mm)), 2, sd(mm), "/")
    a  b
1 -1 -1
2  0  0
3  1  1

(Your test columns happened to be scaled already and only needed to be  
centered. This is how scale() does its work, and their help pages have  
links cross-referencing each other.)

This is probably a good time to reference Burns', The R Inferno, which  
has an entry for sweep (p 57) as well tips regarding the drop=FALSE  
maneuver (p 54) that I tried first for this problem but it "didn't  
work".
-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list