[BioC] Looping over entries of a GRanges object

Julian Gehring julian.gehring at embl.de
Thu May 22 15:14:34 CEST 2014


Hi Michael,

In my use case, the individual entries of the GRanges object define 
regions of interest.  For each ROI, I want to extract data from multiple 
sources and perform a analysis on those.

In this case, I cannot think of a way to avoid the looping over the 
entries.  For a few large regions, this is fine.  With increasing length 
of the GRanges object, most time is spent in extracting the indivdual 
ranges.

An alternative solution would be to convert the columns of the GRanges 
object into a data frame or a set of vectors, and loop over this.  But 
if you want to use some bioc standard functions (e.g. getSeq), you would 
have to contruct the GRanges every time (also slow).

Best wishes
Julian


On 22.05.2014 15:06, Michael Lawrence wrote:
> Hi Julian,
>
> What sort of operations need to be implemented this way? I'm usually
> able to avoid them somehow. As a last resort, reducing the complexity of
> the object helps. For example, you could operate over ranges(gr) if all
> you need are the starts and ends.
>
> Michael
>
>
> On Thu, May 22, 2014 at 5:56 AM, Julian Gehring <julian.gehring at embl.de
> <mailto:julian.gehring at embl.de>> wrote:
>
>     Hi,
>
>     Is there a fast way of looping over/extracting the entries of a
>     'GRanges' object individually?  Due to the complex structure of a
>     GRanges object, a simple solution like
>
>     library(GenomicRanges)
>     n = 1e4
>     gr = GRanges(1, IRanges(1:n, width = 1))
>
>     for(i in seq_along(gr)) {
>          x = gr[i]
>          ## more complex code acting on 'x'
>     }
>
>     takes notably long.  Converting to a 'GRangesList' or
>     'GenomicRangesList' does obviously not improve the situation.  I was
>     wondering if there is some dedicated functionality which would allow
>     this in a faster manner, in cases where simple vectorized operations
>     are not applicable.
>
>     Best wishes
>     Julian
>
>     _________________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/__listinfo/bioconductor
>     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>     Search the archives:
>     http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>



More information about the Bioconductor mailing list