[BioC] Help on alternative and efficient data frame manipulation

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Wed Dec 28 21:41:01 CET 2011


Steve,

Converting to a matrix resulted in a much larger increase in speed compared
with treating the columns as list. Here are the comparison results for a 100
by 100 data frame.

id <- 4:ncol(mydata)
system.time(for (i in id) {
   mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])}
  )
 user  system elapsed
  0.034   0.000   0.037

system.time(for (i in id) {
   mydata[,i] <- ifelse(mydata[,i] > 0, 1, mydata[,i])}
  )
   user  system elapsed
  0.038   0.003   0.042

system.time({m <- as.matrix(mydata[, -(id)])
  m[m > 0] <- 1
  ans <- cbind(mydata[,1:4], as.data.frame(m))})
   user  system elapsed
  0.006   0.000   0.009

Many thanks for your great suggestions!

Best regards,

Julie

On 12/28/11 3:10 PM, "Julie Zhu" <julie.zhu at umassmed.edu> wrote:

> Thanks, Steve,
> 
> Matrix is definitely faster. I will try with list to see if it makes it
> faster.
> 
> Best regards,
> 
> Julie
> 
> 
> On 12/28/11 3:06 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
> wrote:
> 
>> Hi,
>> 
>> On Wed, Dec 28, 2011 at 3:01 PM, Zhu, Lihua (Julie)
>> <Julie.Zhu at umassmed.edu> wrote:
>>> Hi,
>>> 
>>> I have a data frame consisting of 5000 columns and 16000 rows. I would like
>>> to convert all values x in column 4 to 5000 to 1 if x >0. The following code
>>> works but it is very slow. Are there more efficient ways to modify large
>>> number of entries in a data frame? Many thanks for your kind help!
>>> 
>>> id <- 4:ncol(mydata)
>>> for (i in id) {mydata[mydata[,i]>0,i]=1}
>> 
>> You might have better results if you treat the columns of the
>> data.frame as a list, so something like:
>> 
>> for (i in 4:ncol(mydata)) {
>>   mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])
>> }
>> 
>> 
>> ## Or, what if you convert to a matrix?
>> m <- as.matrix(mydata[, -(1:4)])
>> m[m > 0] <- 1
>> ans <- cbind(mydata[,1:4], as.data.frame(m))
>> 
>> 
>> Are any of those better?
>> 
>> -steve
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list