[BioC] Help on alternative and efficient data frame manipulation

Wed Dec 28 21:06:53 CET 2011

Hi,

On Wed, Dec 28, 2011 at 3:01 PM, Zhu, Lihua (Julie)
<Julie.Zhu at umassmed.edu> wrote:
> Hi,
>
> I have a data frame consisting of 5000 columns and 16000 rows. I would like
> to convert all values x in column 4 to 5000 to 1 if x >0. The following code
> works but it is very slow. Are there more efficient ways to modify large
> number of entries in a data frame? Many thanks for your kind help!
>
> id <- 4:ncol(mydata)
> for (i in id) {mydata[mydata[,i]>0,i]=1}

You might have better results if you treat the columns of the
data.frame as a list, so something like:

for (i in 4:ncol(mydata)) {
  mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])
}

## Or, what if you convert to a matrix?
m <- as.matrix(mydata[, -(1:4)])
m[m > 0] <- 1
ans <- cbind(mydata[,1:4], as.data.frame(m))

Are any of those better?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact