[R] Efficiency challenge: MANY subsets

Johannes Graumann johannes_graumann at web.de
Fri Jan 16 21:16:14 CET 2009


Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop, 
since I now would rewrite the code like so:

fragments <- list()
for(iN in seq(length(sequences))){
  cat(paste(iN,"\n"))
  fragments[[iN]] <- 
    lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
}

still very slow for length(sequences) ~ 7000.

Joh

On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
> Try this:
>
> lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
>
> On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann <
>
> johannes_graumann at web.de> wrote:
> > Hello,
> >
> > I have a list of character vectors like this:
> >
> > sequences <- list(
> >
> > 
> > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
> >,"M",
> >
> > 
> > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
> >F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
> > )
> >
> > and another list of subset ranges like this:
> >
> > indexes <- list(
> >  list(
> >    c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
> >  )
> > )
> >
> > What I now want to do is to subset each entry in "sequences"
> > (sequences[[1]]) with all ranges in the corresponding low level list in
> > "indexes" (indexes[[1]]). Here is what I came up with.
> >
> > fragments <- list()
> > for(iN in seq(length(sequences))){
> >  cat(paste(iN,"\n"))
> >  tmpFragments <- sapply(
> >    indexes[[iN]],
> >    function(x){
> >      sequences[[iN]][seq.int(x[1],x[2])]
> >    }
> >  )
> >  fragments[[iN]] <- tmpFragments
> > }
> >
> > This works fine, but "sequences" contains thousands of entries and the
> > corresponding "indexes" are sometimes hundreds of ranges long, so this
> > whole
> > process is EXTREMELY inefficient.
> >
> > Does somebody out there take the challenge and show me a way on how to
> > speed
> > this up?
> >
> > Thanks for any hints,
> >
> > Joh
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/cc294606/attachment-0002.bin>


More information about the R-help mailing list