[BioC] bug in GenomicRanges gaps?

Blanchette, Marco MAB at stowers.org
Fri Sep 6 03:07:34 CEST 2013


though... sorry about the genes.noStrand... was working on a different example before posting... bad pasting...

And yes, I was trying to find the gaps between genic regions regardless of there strand. I was doing the same operations as yours until I fumble onto gaps() which seemed to encapsulate nicely in one function the concept of finding gaps between ranges (right??) and if you look at the operation returned, it uses the seqinfo to compute the right and left gaps in relation to the entire chromosme. However, it returns extra ranges that conceptually are not gaps (ie span the full chromosome)... 

But thanks for the tidbits Malcolm.
-- 
Marco Blanchette, Ph.D.
Assistant Investigator 
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071 
Cell: 816-726-8419 
Fax: 816-926-2018 

On Sep 5, 2013, at 7:53 PM, "Cook, Malcolm" <MEC at stowers.org>
 wrote:

> Marco,
> 
> genes.noStrand is unset when you first reference it, so it is hard to know what you are trying to do.
> 
> Does your definition of intergenic want to ignore strand (i.e. can the same locus be genic on one strand and integenic on another?)
> 
> I'm guessing the problem you are witnessing is that the strand _is_ not '*' when you thought you had set it to '*'.  Check this!
> 
> Also, does your definition of intergenic want to include non-genic segments that are the very beginning/end of a chromosome (and therefore only have a gene on one-side and thus arguably are not truly inter-genic).  If so, your approach is not going to include these chromosome-terminal regions.  In which case, you might try:
> 
> genic.gr<-unlist(transcriptsBy(tr.db,'gene'),use.names=FALSE)                                                                                                                       
> strand(genic.gr)<-'*'                                                                                                                                                               
> genic.gr<-reduce(genic.gr)                                                                                                                                                          
> chrom.gr<-as(seqinfo(tr.db),'GRanges')                                                                                                                                              
> intergenic.gr<-setdiff(chrom.gr,genic.gr)   
> 
> stopifnot(0==length(findOverlaps(intergenic.gr,ignoreSelf=TRUE)))                                                                                                                   
> 
> ~ malcolm_cook at stowers.org
> 
> ________________________________________
> From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Blanchette, Marco
> Sent: Thursday, September 05, 2013 7:14 PM
> To: BioConductor
> Subject: [BioC] bug in GenomicRanges gaps?
> 
> Hi,
> 
> Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
> 
> thanks
> 
>> library(GenomicFeatures)
> ### Downloading the txdb from UCSC
>> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
>                                 tablename="ensGene")
> 
> ### Getting the transcripts for each genes
>> allTx <- transcriptsBy(txdb, by='gene')
> 
> ### Geting the GRanges for each genes
>> genes <- unlist(range(allTx))
> 
> ### Removing the strand to perform the reduce operation
>> strand(genes) <- '*'
> 
> ### Now, let's find the gaps
>> genic <- reduce(genes.noStrand)
> 
> ### Because of teh reduce, there should be no overlaping genic regions
>> length(findOverlaps(genic,ignoreSelf=TRUE))
> [1] 0
> 
> ### Finding the intergenic regions
>> intergenic <- gaps(genic)
> 
> ### The difference of non-overlaping ranges should return non-overlaping gaps, but...
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 46300
> 
> ### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
> ### Why???
>> chrs.width <- seqlengths(seqinfo(intergenic))
>> intergenic[width(intergenic) == gene2chrsLength]
> GRanges with 30 ranges and 0 metadata columns:
>        seqnames        ranges strand
>           <Rle>     <IRanges>  <Rle>
>   [1]     chr2L [1, 23011544]      +
>   [2]     chr2L [1, 23011544]      -
>   [3]     chr2R [1, 21146708]      +
>   [4]     chr2R [1, 21146708]      -
>   [5]     chr3L [1, 24543557]      +
>   ...       ...           ...    ...
>  [26]   chrXHet [1,   204112]      -
>  [27]   chrYHet [1,   347038]      +
>  [28]   chrYHet [1,   347038]      -
>  [29] chrUextra [1, 29004656]      +
>  [30] chrUextra [1, 29004656]      -
>  ---
>  seqlengths:
>       chr2L     chr2R     chr3L     chr3R ...   chrXHet   chrYHet chrUextra
>    23011544  21146708  24543557  27905053 ...    204112    347038  29004656
> 
> 
> ### On Fake data
>> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
>                 ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
>> length(findOverlaps(intergenic,ignoreSelf=TRUE))
> [1] 0
>> intergenic
> GRanges with 6 ranges and 0 metadata columns:
>      seqnames     ranges strand
>         <Rle>  <IRanges>  <Rle>
>  [1]       2L [  1,  19]      *
>  [2]       2L [ 81, 199]      *
>  [3]       2L [276, 349]      *
>  [4]       3L [  1,  19]      *
>  [5]       3L [ 81, 199]      *
>  [6]       3L [276, 349]      *
>  ---
>  seqlengths:
>   2L 3L
>   NA NA
> 
> --
> Marco Blanchette, Ph.D.
> Assistant Investigator
> Stowers Institute for Medical Research
> 1000 East 50th St.
> 
> Kansas City, MO 64110
> 
> Tel: 816-926-4071
> Cell: 816-726-8419
> Fax: 816-926-2018
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list