[R] How can make loop more faster?

David Winsemius dwinsemius at comcast.net
Sun Jun 16 00:54:05 CEST 2013


On Jun 15, 2013, at 2:38 PM, Fabrice Tourre wrote:

> Dear expert,
> 
> How can I make follow codematrix function more faster?
> 
> top20.dat <- top20.dat[,7:length(top20.dat[1,])]
> top40.dat <-
> read.table("top40snps.ped",header=F,sep="\t",stringsAsFactors=F)

Did you attach a file with a non-'.txt' extension?

> row.names(top40.dat) <- top40.dat[,1]
> top40.dat <- top40.dat[,7:length(top40.dat[1,])]
> codematrix <- function(dat)
> {
>    new.dat <- dat
>    for(col in 1:length(dat[1,]))
>    {

I'm guessing that using `ifelse` would be much faster than going column by column and and then row by painful row through this testing with nested `if(cond){conseq}else{alter}` . You should gain efficiency by setting up the results of the `strsplit` operation on a full column at a time. Build a function that would work one column at a time and then lapply it to the dataframe.

>        tbl <- table(dat[,col])
>        max.allel <- names(which(tbl==max(table(dat[,col]))))
>        for(row in 1:length(dat[,1]))
>            {
>                if(dat[row,col]=="0 0")
>                {
>                    new.dat[row,col]=NA
>                }else{
>                    if(dat[row,col]==max.allel) {
>                        new.dat[row,col]=0
>                    }else{
>                        allele <- unlist(strsplit(
> as.character(dat[row,col])," "))
>                        if(allele[1]==allele[2]){
>                            new.dat[row,col]=2
>                        }else{
>                            new.dat[row,col]=1

You could leave the "==max.allelle" test on the outer of nested ifelse operations to "overwrite" the resutls of the testing of the two split-bits. But I would make it a %in%-test so that it won't fail when mor than one maximum occur.

Perhaps (untested and a lot of guesswork):

testsplitfunc <- function(col){
         temptbl <- table(col)
         tempspl <-  strsplit(as.character(col) , split=" ")
         allele <-cbind( sapply(temp, "[", 1),
                         sapply(temp, "[", 2) )
         res <- ifelse ( col %in% names(temptbl)[ which(tbl==max(temptbl))] , 
                     0,
                     ifelse( allele[,1]==allele[,2], 2, 1) )
         is.na(res) <- col=="0 0"
         }

code.top20 <- do.call(cbind, lapply(top20.dat, testsplitfunc) )
                     




>                        }
>                    }
>                }
>            }
>        #})
>        cat(paste(col," ",sep=""))
>    }
>    return(new.dat)
> }
> code.top20 <- codematrix(top20.dat)

In the absence of a problem description I will leave the details unaddressed.
> 
> 	[[alternative HTML version deleted]]


David Winsemius
Alameda, CA, USA



More information about the R-help mailing list