[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?

Janet Young jayoung at fhcrc.org
Wed Jun 11 00:14:00 CEST 2014


Hi again,

This is somewhat related to another request I sent earlier this afternoon. Is it possible to implement query-self comparisons (i.e. where I specify query but not subject) for GRangesList objects?   The motivation is that I'd like to use the ignoreSelf and ignoreRedundant options in a GRangesList comparison.

I mentioned in my other request that I'm looking through a set of genes to find pairs that overlap on opposite strands.  I'm now using findOverlaps on a GRangesList object instead of a GRanges object - that's because I want to only look at cases where parts of the final spliced transcript overlap, not cases where a large intron-containing gene has another smaller gene nested in an intron.  When I used findOverlaps on the entire genes at once as GRanges objects, my results included pairs of that nested type, but if I use the blocks function to get just the exons as a GRangesList object, that lets me successfully ignore the nested gene case, which is great.  However, with GRangesList I can't use the query-self comparison and therefore can't access those useful ignoreSelf and ignoreRedundant options.  I know I can workaround that too with some effort, but it'd be great to have it as part of the underlying code.

Again, I've included some code below that should show what I'm trying to do.

all the best,

Janet


library(rtracklayer)

#### get some drosophila genes as a test case:
mySession <- browserSession()
genome(mySession) <- "dm3"
genes <- ucscTableQuery (mySession, track="flyBaseGene", table="flyBaseGene")
genes <- track(genes)

#### reduce to a smaller example that contains a gene pair of the type I'm talking about (CG33797-RA is nested inside CG11152-RA)
genes <- genes[148:152]

#### remove strand info, as a workaround to not being able to specify ignore.strand for a query-self findOverlaps call
strand(genes) <- "*"

#### using findOverlaps on the genes themselves shows me the nested pair (query=3, subject=4)
findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE)

#### so I'll use blocks to extract only the exonic portions of the genes as a GRangesList:
genes_GRL <- blocks(genes)

#### and use findOverlaps on that GRangesList object, first by specifying it as both query and subject in the comparison - this gives me more or less what I want (i.e. it does NOT show the nested pair 3-4), except that there's a bunch of filtering to do later.
findOverlaps( genes_GRL, genes_GRL)

#### ideally, to help me filter the hits I'd like to be able to use ignoreSelf and ignoreRedundant, but I can only use those if it's a query-self comparison (i.e. only works if no subject is specified)
findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE)
# Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : 
#   unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE)

#### and it looks like findOverlaps is not implemented for the query-self case for GRangesList objects
findOverlaps( genes_GRL)
# Error in match.arg(type) : 'arg' must be of length 1

sessionInfo()

R version 3.1.0 Patched (2014-05-26 r65771)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rtracklayer_1.25.11   GenomicRanges_1.17.17 GenomeInfoDb_1.1.6   
[4] IRanges_1.99.15       S4Vectors_0.0.8       BiocGenerics_0.11.2  

loaded via a namespace (and not attached):
 [1] BatchJobs_1.2            BBmisc_1.6               BiocParallel_0.7.2      
 [4] Biostrings_2.33.10       bitops_1.0-6             brew_1.0-6              
 [7] codetools_0.2-8          DBI_0.2-7                digest_0.6.4            
[10] fail_1.2                 foreach_1.4.2            GenomicAlignments_1.1.13
[13] iterators_1.0.7          plyr_1.8.1               Rcpp_0.11.2             
[16] RCurl_1.95-4.1           Rsamtools_1.17.23        RSQLite_0.11.4          
[19] sendmailR_1.1-2          stats4_3.1.0             stringr_0.6.2           
[22] tools_3.1.0              XML_3.98-1.1             XVector_0.5.6           
[25] zlibbioc_1.11.1         



More information about the Bioconductor mailing list