[BioC] Help on alternative and efficient data frame manipulation

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Wed Dec 28 22:55:35 CET 2011


Steve,

Thanks for spotting the bug! Luckily, the code with the real data is
correct!

As you suspected, fixing the bug did not change the speed as much. Thanks
again!

Best regards,

Julie


On 12/28/11 4:47 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
wrote:

> Hi Julie,
> 
> On Wed, Dec 28, 2011 at 3:41 PM, Zhu, Lihua (Julie)
> <Julie.Zhu at umassmed.edu> wrote:
>> Steve,
>> 
>> Converting to a matrix resulted in a much larger increase in speed compared
>> with treating the columns as list. Here are the comparison results for a 100
>> by 100 data frame.
> 
> Cool ... I guess the conversion to an intermediary matrix will take
> more temp memory to do, but if you can afford it, than great.
> 
> That having been said, though, it looks like there's a small bug in
> your test code, no?
> 
> See this part here:
> 
>> id <- 4:ncol(mydata)
> [snip]
>> system.time({m <- as.matrix(mydata[, -(id)])
>>  m[m > 0] <- 1
>>  ans <- cbind(mydata[,1:4], as.data.frame(m))})
>>   user  system elapsed
>>  0.006   0.000   0.009
> 
> It looks like your temporary `m` matrix is the opposite of what you
> want, no? Shouldn't the assignment to `m` be:
> 
> m <- as.matrix(mydata[, id])
> 
> maybe?
> 
> Not sure how much of a difference this will make in the timings again,
> but perhaps it's something worth seeing ... I reckon both methods are
> fast enough where the differences between the two aren't worth
> stressing over either way.
> 
>> Many thanks for your great suggestions!
> 
> Sure thing .. glad that it was helpful,
> 
> -steve



More information about the Bioconductor mailing list