[BioC] Help on alternative and efficient data frame manipulation

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Dec 28 22:47:20 CET 2011


Hi Julie,

On Wed, Dec 28, 2011 at 3:41 PM, Zhu, Lihua (Julie)
<Julie.Zhu at umassmed.edu> wrote:
> Steve,
>
> Converting to a matrix resulted in a much larger increase in speed compared
> with treating the columns as list. Here are the comparison results for a 100
> by 100 data frame.

Cool ... I guess the conversion to an intermediary matrix will take
more temp memory to do, but if you can afford it, than great.

That having been said, though, it looks like there's a small bug in
your test code, no?

See this part here:

> id <- 4:ncol(mydata)
[snip]
> system.time({m <- as.matrix(mydata[, -(id)])
>  m[m > 0] <- 1
>  ans <- cbind(mydata[,1:4], as.data.frame(m))})
>   user  system elapsed
>  0.006   0.000   0.009

It looks like your temporary `m` matrix is the opposite of what you
want, no? Shouldn't the assignment to `m` be:

m <- as.matrix(mydata[, id])

maybe?

Not sure how much of a difference this will make in the timings again,
but perhaps it's something worth seeing ... I reckon both methods are
fast enough where the differences between the two aren't worth
stressing over either way.

> Many thanks for your great suggestions!

Sure thing .. glad that it was helpful,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list