[BioC] bug in GenomicRanges gaps?

Blanchette, Marco MAB at stowers.org
Fri Sep 6 02:52:08 CEST 2013


Little bit more investigating let me realize that it happen when the seqinfo has seqLengths populated, check this out

> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
+                  ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))

### Adding some chromosome width
> seqlengths(seqinfo(genes)) <- c(500,700)

### Computing the gaps
> intergenic <- gaps(genes)

### Now I am getting overlapping gaps!
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 32

### Take a look at ranges 1,2, 7 and 8, they are spanning the full chromosome. Why... Expected? 
> intergenic[c(1,2,7,8)]
GRanges with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]       2L  [1, 500]      +
  [2]       2L  [1, 500]      -
  [3]       3L  [1, 700]      +
  [4]       3L  [1, 700]      -
  ---
  seqlengths:
    2L  3L
   500 700

Thanks

-- 
Marco Blanchette, Ph.D.
Assistant Investigator 
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071 
Cell: 816-726-8419 
Fax: 816-926-2018 

On Sep 5, 2013, at 7:14 PM, "Blanchette, Marco" <MAB at stowers.org> wrote:

> Hi,
> 
> Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
> 
> thanks
> 
>> library(GenomicFeatures)
> ### Downloading the txdb from UCSC
>> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
>                                 tablename="ensGene")
> 
> ### Getting the transcripts for each genes
>> allTx <- transcriptsBy(txdb, by='gene')
> 
> ### Geting the GRanges for each genes
>> genes <- unlist(range(allTx))
> 
> ### Removing the strand to perform the reduce operation
>> strand(genes) <- '*'
> 
> ### Now, let's find the gaps
>> genic <- reduce(genes.noStrand)
> 
> ### Because of teh reduce, there should be no overlaping genic regions
>> length(findOverlaps(genic,ignoreSelf=TRUE))
> [1] 0
> 
> ### Finding the intergenic regions
>> intergenic <- gaps(genic)
> 
> ### The difference of non-overlaping ranges should return non-overlaping gaps, but...
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 46300
> 
> ### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
> ### Why???
>> chrs.width <- seqlengths(seqinfo(intergenic))
>> intergenic[width(intergenic) == gene2chrsLength]
> GRanges with 30 ranges and 0 metadata columns:
>        seqnames        ranges strand
>           <Rle>     <IRanges>  <Rle>
>   [1]     chr2L [1, 23011544]      +
>   [2]     chr2L [1, 23011544]      -
>   [3]     chr2R [1, 21146708]      +
>   [4]     chr2R [1, 21146708]      -
>   [5]     chr3L [1, 24543557]      +
>   ...       ...           ...    ...
>  [26]   chrXHet [1,   204112]      -
>  [27]   chrYHet [1,   347038]      +
>  [28]   chrYHet [1,   347038]      -
>  [29] chrUextra [1, 29004656]      +
>  [30] chrUextra [1, 29004656]      -
>  ---
>  seqlengths:
>       chr2L     chr2R     chr3L     chr3R ...   chrXHet   chrYHet chrUextra
>    23011544  21146708  24543557  27905053 ...    204112    347038  29004656
> 
> 
> ### On Fake data
>> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
>                 ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 0
>> intergenic
> GRanges with 6 ranges and 0 metadata columns:
>      seqnames     ranges strand
>         <Rle>  <IRanges>  <Rle>
>  [1]       2L [  1,  19]      *
>  [2]       2L [ 81, 199]      *
>  [3]       2L [276, 349]      *
>  [4]       3L [  1,  19]      *
>  [5]       3L [ 81, 199]      *
>  [6]       3L [276, 349]      *
>  ---
>  seqlengths:
>   2L 3L
>   NA NA
> 
> -- 
> Marco Blanchette, Ph.D.
> Assistant Investigator 
> Stowers Institute for Medical Research
> 1000 East 50th St.
> 
> Kansas City, MO 64110
> 
> Tel: 816-926-4071 
> Cell: 816-726-8419 
> Fax: 816-926-2018 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list