[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
Janet Young
jayoung at fhcrc.org
Tue Jun 17 20:54:50 CEST 2014
Thanks very much all - yes, I can now get what I want by adding type="any":
findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
I guess in the long term that'll be unnecessary? It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods.
Janet
On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
> Great. Thanks.
>
> Val
>
> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>> A GRangesList is a Vector, so that method should work. It just neglected
>> to resolve the type argument with match.arg() before forwarding. That's
>> now fixed.
>>
>> Michael
>>
>>
>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>> wrote:
>>
>> Hi Janet, Michael,
>>
>> Janet, were you able to get what you needed by specifying 'type'?
>>
>> Michael, which method are you referring to when you say it exists
>> but is broken? The only method I see that allows a missing 'subject'
>> is query="Vector". Maybe we need to add query="GRangesList"
>> subject="missing"?
>>
>>
>> Val
>>
>>
>>
>> On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>>
>> Turns out I added something else. In this case, the method does
>> exist, but
>> it's just broken. You might have better luck if you specify the type
>> argument to findOverlaps, like type="any".
>>
>>
>> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>> <mailto:jayoung at fhcrc.org>> wrote:
>>
>> I'm glad that's on your radar already - nice. I updated all
>> devel
>> packages this morning - it's not there yet. Here's my new
>> sessionInfo():
>>
>> sessionInfo()
>>
>> R version 3.1.0 Patched (2014-05-26 r65771)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils
>> datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18
>> GenomeInfoDb_1.1.7
>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>> BiocGenerics_0.11.2
>>
>> loaded via a namespace (and not attached):
>> [1] BatchJobs_1.2 BBmisc_1.6
>> BiocParallel_0.7.2
>>
>> [4] Biostrings_2.33.10 bitops_1.0-6
>> brew_1.0-6
>>
>> [7] codetools_0.2-8 DBI_0.2-7
>> digest_0.6.4
>>
>> [10] fail_1.2 foreach_1.4.2
>> GenomicAlignments_1.1.14
>> [13] iterators_1.0.7 plyr_1.8.1
>> Rcpp_0.11.2
>>
>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25
>> RSQLite_0.11.4
>>
>> [19] sendmailR_1.1-2 stats4_3.1.0
>> stringr_0.6.2
>>
>> [22] tools_3.1.0 XML_3.98-1.1
>> XVector_0.5.6
>>
>> [25] zlibbioc_1.11.1
>>
>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>> wrote:
>>
>> I think I might have already added this a couple days ago to
>> devel...
>> tough to keep it all straight in my head
>>
>>
>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>> <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>>
>> Hi again,
>>
>> This is somewhat related to another request I sent
>> earlier this
>> afternoon. Is it possible to implement query-self
>> comparisons (i.e. where I
>> specify query but not subject) for GRangesList objects?
>> The motivation is
>> that I'd like to use the ignoreSelf and ignoreRedundant
>> options in a
>> GRangesList comparison.
>>
>> I mentioned in my other request that I'm looking through
>> a set of genes
>> to find pairs that overlap on opposite strands. I'm now
>> using findOverlaps
>> on a GRangesList object instead of a GRanges object -
>> that's because I want
>> to only look at cases where parts of the final spliced
>> transcript overlap,
>> not cases where a large intron-containing gene has
>> another smaller gene
>> nested in an intron. When I used findOverlaps on the
>> entire genes at once
>> as GRanges objects, my results included pairs of that
>> nested type, but if I
>> use the blocks function to get just the exons as a
>> GRangesList object, that
>> lets me successfully ignore the nested gene case, which
>> is great. However,
>> with GRangesList I can't use the query-self comparison
>> and therefore can't
>> access those useful ignoreSelf and ignoreRedundant
>> options. I know I can
>> workaround that too with some effort, but it'd be great
>> to have it as part
>> of the underlying code.
>>
>> Again, I've included some code below that should show
>> what I'm trying to
>> do.
>>
>> all the best,
>>
>> Janet
>>
>>
>> library(rtracklayer)
>>
>> #### get some drosophila genes as a test case:
>> mySession <- browserSession()
>> genome(mySession) <- "dm3"
>> genes <- ucscTableQuery (mySession, track="flyBaseGene",
>> table="flyBaseGene")
>> genes <- track(genes)
>>
>> #### reduce to a smaller example that contains a gene
>> pair of the type
>> I'm talking about (CG33797-RA is nested inside CG11152-RA)
>> genes <- genes[148:152]
>>
>> #### remove strand info, as a workaround to not being
>> able to specify
>> ignore.strand for a query-self findOverlaps call
>> strand(genes) <- "*"
>>
>> #### using findOverlaps on the genes themselves shows me
>> the nested pair
>> (query=3, subject=4)
>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>> = TRUE)
>>
>> #### so I'll use blocks to extract only the exonic
>> portions of the genes
>> as a GRangesList:
>> genes_GRL <- blocks(genes)
>>
>> #### and use findOverlaps on that GRangesList object,
>> first by specifying
>> it as both query and subject in the comparison - this
>> gives me more or less
>> what I want (i.e. it does NOT show the nested pair 3-4),
>> except that
>> there's a bunch of filtering to do later.
>> findOverlaps( genes_GRL, genes_GRL)
>>
>> #### ideally, to help me filter the hits I'd like to be
>> able to use
>> ignoreSelf and ignoreRedundant, but I can only use those
>> if it's a
>> query-self comparison (i.e. only works if no subject is
>> specified)
>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>> ignoreRedundant=TRUE)
>> # Error in .local(query, subject, maxgap, minoverlap,
>> type, select, ...) :
>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant
>> = TRUE)
>>
>> #### and it looks like findOverlaps is not implemented
>> for the query-self
>> case for GRangesList objects
>> findOverlaps( genes_GRL)
>> # Error in match.arg(type) : 'arg' must be of length 1
>>
>> sessionInfo()
>>
>> R version 3.1.0 Patched (2014-05-26 r65771)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils
>> datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17
>> GenomeInfoDb_1.1.6
>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>> BiocGenerics_0.11.2
>>
>> loaded via a namespace (and not attached):
>> [1] BatchJobs_1.2 BBmisc_1.6
>> BiocParallel_0.7.2
>> [4] Biostrings_2.33.10 bitops_1.0-6
>> brew_1.0-6
>> [7] codetools_0.2-8 DBI_0.2-7
>> digest_0.6.4
>> [10] fail_1.2 foreach_1.4.2
>> GenomicAlignments_1.1.13
>> [13] iterators_1.0.7 plyr_1.8.1
>> Rcpp_0.11.2
>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23
>> RSQLite_0.11.4
>> [19] sendmailR_1.1-2 stats4_3.1.0
>> stringr_0.6.2
>> [22] tools_3.1.0 XML_3.98-1.1
>> XVector_0.5.6
>> [25] zlibbioc_1.11.1
>>
>> _________________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> <mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives:
>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>> _________________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives:
>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>>
>
More information about the Bioconductor
mailing list