[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?

Valerie Obenchain vobencha at fhcrc.org
Wed Jun 18 22:50:22 CEST 2014


I think Michael forgot to bump the version after his fix so the package 
didn't propagate. It's bumped now and 1.99.16 should be available w/ 
biocLite() by noon tomorrow or in svn immediately.

findOverlaps() now works on a GRangesList w/ no subject and the ignore* 
args as necessary. The default for 'type' is "any" for all 
findOverlaps() methods.

example(GRangesList) ## for 'grl'
>> findOverlaps(grl)
> Hits of length 3
> queryLength: 3
> subjectLength: 3
>   queryHits subjectHits
>    <integer>   <integer>
>  1         1           1
>  2         2           2
>  3         3           3


Valerie

On 06/17/2014 11:54 AM, Janet Young wrote:
> Thanks very much all - yes, I can now get what I want by adding type="any":
>       findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
> I guess in the long term that'll be unnecessary?
   It seems to make sense to have a default of type="any", to be 
consistent with all the other findOverlaps methods.
>
> Janet
>
>
>
>
> On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>
>> Great. Thanks.
>>
>> Val
>>
>> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>>> A GRangesList is a Vector, so that method should work. It just neglected
>>> to resolve the type argument with match.arg() before forwarding. That's
>>> now fixed.
>>>
>>> Michael
>>>
>>>
>>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>>> <mailto:vobencha at fhcrc.org>> wrote:
>>>
>>>     Hi Janet, Michael,
>>>
>>>     Janet, were you able to get what you needed by specifying 'type'?
>>>
>>>     Michael, which method are you referring to when you say it exists
>>>     but is broken? The only method I see that allows a missing 'subject'
>>>     is query="Vector". Maybe we need to add query="GRangesList"
>>>     subject="missing"?
>>>
>>>
>>>     Val
>>>
>>>
>>>
>>>     On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>>>
>>>         Turns out I added something else. In this case, the method does
>>>         exist, but
>>>         it's just broken. You might have better luck if you specify the type
>>>         argument to findOverlaps, like type="any".
>>>
>>>
>>>         On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>>>         <mailto:jayoung at fhcrc.org>> wrote:
>>>
>>>             I'm glad that's on your radar already - nice.  I updated all
>>>             devel
>>>             packages this morning - it's not there yet.  Here's my new
>>>             sessionInfo():
>>>
>>>                 sessionInfo()
>>>
>>>             R version 3.1.0 Patched (2014-05-26 r65771)
>>>             Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>>             locale:
>>>                [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>                [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>                [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>                [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>>             attached base packages:
>>>             [1] parallel  stats     graphics  grDevices utils
>>>             datasets  methods
>>>             [8] base
>>>
>>>             other attached packages:
>>>             [1] rtracklayer_1.25.11   GenomicRanges_1.17.18
>>>             GenomeInfoDb_1.1.7
>>>             [4] IRanges_1.99.15       S4Vectors_0.0.8
>>>             BiocGenerics_0.11.2
>>>
>>>             loaded via a namespace (and not attached):
>>>                [1] BatchJobs_1.2            BBmisc_1.6
>>>             BiocParallel_0.7.2
>>>
>>>                [4] Biostrings_2.33.10       bitops_1.0-6
>>>             brew_1.0-6
>>>
>>>                [7] codetools_0.2-8          DBI_0.2-7
>>>               digest_0.6.4
>>>
>>>             [10] fail_1.2                 foreach_1.4.2
>>>             GenomicAlignments_1.1.14
>>>             [13] iterators_1.0.7          plyr_1.8.1
>>>             Rcpp_0.11.2
>>>
>>>             [16] RCurl_1.95-4.1           Rsamtools_1.17.25
>>>               RSQLite_0.11.4
>>>
>>>             [19] sendmailR_1.1-2          stats4_3.1.0
>>>             stringr_0.6.2
>>>
>>>             [22] tools_3.1.0              XML_3.98-1.1
>>>             XVector_0.5.6
>>>
>>>             [25] zlibbioc_1.11.1
>>>
>>>             On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>>>             <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>>>             wrote:
>>>
>>>             I think I might have already added this a couple days ago to
>>>             devel...
>>>             tough to keep it all straight in my head
>>>
>>>
>>>             On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>>>             <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>>>
>>>                 Hi again,
>>>
>>>                 This is somewhat related to another request I sent
>>>                 earlier this
>>>                 afternoon. Is it possible to implement query-self
>>>                 comparisons (i.e. where I
>>>                 specify query but not subject) for GRangesList objects?
>>>                    The motivation is
>>>                 that I'd like to use the ignoreSelf and ignoreRedundant
>>>                 options in a
>>>                 GRangesList comparison.
>>>
>>>                 I mentioned in my other request that I'm looking through
>>>                 a set of genes
>>>                 to find pairs that overlap on opposite strands.  I'm now
>>>                 using findOverlaps
>>>                 on a GRangesList object instead of a GRanges object -
>>>                 that's because I want
>>>                 to only look at cases where parts of the final spliced
>>>                 transcript overlap,
>>>                 not cases where a large intron-containing gene has
>>>                 another smaller gene
>>>                 nested in an intron.  When I used findOverlaps on the
>>>                 entire genes at once
>>>                 as GRanges objects, my results included pairs of that
>>>                 nested type, but if I
>>>                 use the blocks function to get just the exons as a
>>>                 GRangesList object, that
>>>                 lets me successfully ignore the nested gene case, which
>>>                 is great.  However,
>>>                 with GRangesList I can't use the query-self comparison
>>>                 and therefore can't
>>>                 access those useful ignoreSelf and ignoreRedundant
>>>                 options.  I know I can
>>>                 workaround that too with some effort, but it'd be great
>>>                 to have it as part
>>>                 of the underlying code.
>>>
>>>                 Again, I've included some code below that should show
>>>                 what I'm trying to
>>>                 do.
>>>
>>>                 all the best,
>>>
>>>                 Janet
>>>
>>>
>>>                 library(rtracklayer)
>>>
>>>                 #### get some drosophila genes as a test case:
>>>                 mySession <- browserSession()
>>>                 genome(mySession) <- "dm3"
>>>                 genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>>                 table="flyBaseGene")
>>>                 genes <- track(genes)
>>>
>>>                 #### reduce to a smaller example that contains a gene
>>>                 pair of the type
>>>                 I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>>                 genes <- genes[148:152]
>>>
>>>                 #### remove strand info, as a workaround to not being
>>>                 able to specify
>>>                 ignore.strand for a query-self findOverlaps call
>>>                 strand(genes) <- "*"
>>>
>>>                 #### using findOverlaps on the genes themselves shows me
>>>                 the nested pair
>>>                 (query=3, subject=4)
>>>                 findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>>>                 = TRUE)
>>>
>>>                 #### so I'll use blocks to extract only the exonic
>>>                 portions of the genes
>>>                 as a GRangesList:
>>>                 genes_GRL <- blocks(genes)
>>>
>>>                 #### and use findOverlaps on that GRangesList object,
>>>                 first by specifying
>>>                 it as both query and subject in the comparison - this
>>>                 gives me more or less
>>>                 what I want (i.e. it does NOT show the nested pair 3-4),
>>>                 except that
>>>                 there's a bunch of filtering to do later.
>>>                 findOverlaps( genes_GRL, genes_GRL)
>>>
>>>                 #### ideally, to help me filter the hits I'd like to be
>>>                 able to use
>>>                 ignoreSelf and ignoreRedundant, but I can only use those
>>>                 if it's a
>>>                 query-self comparison (i.e. only works if no subject is
>>>                 specified)
>>>                 findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>>>                 ignoreRedundant=TRUE)
>>>                 # Error in .local(query, subject, maxgap, minoverlap,
>>>                 type, select, ...) :
>>>                 #   unused arguments (ignoreSelf = TRUE, ignoreRedundant
>>>                 = TRUE)
>>>
>>>                 #### and it looks like findOverlaps is not implemented
>>>                 for the query-self
>>>                 case for GRangesList objects
>>>                 findOverlaps( genes_GRL)
>>>                 # Error in match.arg(type) : 'arg' must be of length 1
>>>
>>>                 sessionInfo()
>>>
>>>                 R version 3.1.0 Patched (2014-05-26 r65771)
>>>                 Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>>                 locale:
>>>                    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>                    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>                    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>                    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>                    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>>                 attached base packages:
>>>                 [1] parallel  stats     graphics  grDevices utils
>>>                 datasets  methods
>>>                 [8] base
>>>
>>>                 other attached packages:
>>>                 [1] rtracklayer_1.25.11   GenomicRanges_1.17.17
>>>                 GenomeInfoDb_1.1.6
>>>                 [4] IRanges_1.99.15       S4Vectors_0.0.8
>>>                 BiocGenerics_0.11.2
>>>
>>>                 loaded via a namespace (and not attached):
>>>                    [1] BatchJobs_1.2            BBmisc_1.6
>>>                 BiocParallel_0.7.2
>>>                    [4] Biostrings_2.33.10       bitops_1.0-6
>>>                 brew_1.0-6
>>>                    [7] codetools_0.2-8          DBI_0.2-7
>>>                   digest_0.6.4
>>>                 [10] fail_1.2                 foreach_1.4.2
>>>                    GenomicAlignments_1.1.13
>>>                 [13] iterators_1.0.7          plyr_1.8.1
>>>                 Rcpp_0.11.2
>>>                 [16] RCurl_1.95-4.1           Rsamtools_1.17.23
>>>                   RSQLite_0.11.4
>>>                 [19] sendmailR_1.1-2          stats4_3.1.0
>>>                 stringr_0.6.2
>>>                 [22] tools_3.1.0              XML_3.98-1.1
>>>                 XVector_0.5.6
>>>                 [25] zlibbioc_1.11.1
>>>
>>>                 _________________________________________________
>>>                 Bioconductor mailing list
>>>                 Bioconductor at r-project.org
>>>                 <mailto:Bioconductor at r-project.org>
>>>                 https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>                 <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>                 Search the archives:
>>>                 http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>                 <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>>
>>>
>>>
>>>                  [[alternative HTML version deleted]]
>>>
>>>
>>>         _________________________________________________
>>>         Bioconductor mailing list
>>>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>         Search the archives:
>>>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>>
>>
>



More information about the Bioconductor mailing list