[R] unexpected behaviour with ddply and colwise

Stuart Andrews stu.andrews at gmail.com
Thu Apr 8 00:39:45 CEST 2010


Ahhh, I see my error, thanks to Steve and others who mailed me off list.

Perhaps reading a little too quickly, I mis-interpreted the help for  
ddply, in particular, the second argument:

 >	".variables: variables to split data frame by, as quoted variables, a
 >		formula or character vector"

I assumed that I could select entire columns (i.e. the *variables*  
that comprise my data.frame) using this argument.

Thx again,
- S.


On Apr 7, 2010, at 6:13 PM, Steve Lianoglou wrote:

> Howdy,
>
> I'm no plyr master, but here's my 2 cents ...
>
> On Wed, Apr 7, 2010 at 5:15 PM, Stuart Andrews  
> <stu.andrews at gmail.com> wrote:
>> Hi,
>>
>> I am confused by results from:
>>
>>> ddply(aa, names(aa), colwise(sum))
>>
>> I thought ddply was just calling colwise(sum)() with each column.    
>> However
>> ddply() returns a 13 x 5 result !!
>>
>> The general result I expected is similar to that of  apply()  , or  
>> using
>> colwise(sum)()  alone.  Shouldn't  ddply()  produce the same ?
>
> Not sure what exactly is happening, but I don't think I'd expect ddply
> to produce the same as the example you gave, since the second arg to
> ddply determines how the aa data.frame should be split (row-wise)
> before the colwise(...) do-hicky is called.
>
> I'm not sure, but what are you trying to get at by row-wise splitting
> `aa` by c('a', 'b', 'c', 'd', 'e')  [ie. namaes(aa)]?
>
>>
>> Thanks in advance for your help,
>> - Stuart Andrews
>>
>>
>>> set.seed(1234)
>>> aa = as.data.frame(matrix(rnorm(100)>0.3,nrow=20))
>>> names(aa) = c('a','b','c','d','e')
>>> head(aa)
>> a     b     c     d     e
>> 1 FALSE FALSE FALSE  TRUE  TRUE
>> 2  TRUE  TRUE FALSE  TRUE FALSE
>> 3  TRUE  TRUE FALSE  TRUE  TRUE
>> 4  TRUE FALSE FALSE  TRUE FALSE
>> 5  TRUE FALSE FALSE  TRUE FALSE
>> 6 FALSE FALSE FALSE FALSE  TRUE
>>
>>> ddply(aa, names(aa), colwise(sum))
>> a b c d e
>> 1  0 0 0 0 0
>> 2  0 0 0 0 2
>> 3  0 0 0 4 0
>> 4  0 0 0 1 1
>> 5  0 0 1 0 0
>> 6  0 0 2 0 2
>> 7  0 0 1 1 0
>> 8  0 2 0 0 0
>> 9  0 1 0 0 1
>> 10 1 0 0 0 0
>> 11 2 0 0 0 2
>> 12 1 0 0 1 0
>> 13 1 0 0 1 1
>>
>>> apply(as.matrix(aa),2,sum)
>> a b c d e
>> 5 3 4 8 9
>>
>>> colwise(sum)(aa)
>>  a b c d e
>> 1 5 3 4 8 9
>>
>>
>> ... Isn't ddply() just doing something like this for each column??
>>
>>> colwise(sum)(aa[,1,drop=F])
>>  a
>> 1 5
>
> That's what colwise is doing per each column of the data.frame it's
> working on ... ddply does the split-by-row/apply/merge magic on the
> data frame and is giving colwise smaller chunks of `aa` to work on at
> a time...
>
> So, to summarize, I think you just need to figure out the correct 2nd
> arg to ddply for your specific problem.
>
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list