[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?

Janet Young jayoung at fhcrc.org
Tue Jun 17 20:54:50 CEST 2014


Thanks very much all - yes, I can now get what I want by adding type="any":
     findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
I guess in the long term that'll be unnecessary?   It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods.

Janet




On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:

> Great. Thanks.
> 
> Val
> 
> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>> A GRangesList is a Vector, so that method should work. It just neglected
>> to resolve the type argument with match.arg() before forwarding. That's
>> now fixed.
>> 
>> Michael
>> 
>> 
>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>> wrote:
>> 
>>    Hi Janet, Michael,
>> 
>>    Janet, were you able to get what you needed by specifying 'type'?
>> 
>>    Michael, which method are you referring to when you say it exists
>>    but is broken? The only method I see that allows a missing 'subject'
>>    is query="Vector". Maybe we need to add query="GRangesList"
>>    subject="missing"?
>> 
>> 
>>    Val
>> 
>> 
>> 
>>    On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>> 
>>        Turns out I added something else. In this case, the method does
>>        exist, but
>>        it's just broken. You might have better luck if you specify the type
>>        argument to findOverlaps, like type="any".
>> 
>> 
>>        On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>>        <mailto:jayoung at fhcrc.org>> wrote:
>> 
>>            I'm glad that's on your radar already - nice.  I updated all
>>            devel
>>            packages this morning - it's not there yet.  Here's my new
>>            sessionInfo():
>> 
>>                sessionInfo()
>> 
>>            R version 3.1.0 Patched (2014-05-26 r65771)
>>            Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>>            locale:
>>               [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>               [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>               [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>               [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>               [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>            [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>>            attached base packages:
>>            [1] parallel  stats     graphics  grDevices utils
>>            datasets  methods
>>            [8] base
>> 
>>            other attached packages:
>>            [1] rtracklayer_1.25.11   GenomicRanges_1.17.18
>>            GenomeInfoDb_1.1.7
>>            [4] IRanges_1.99.15       S4Vectors_0.0.8
>>            BiocGenerics_0.11.2
>> 
>>            loaded via a namespace (and not attached):
>>               [1] BatchJobs_1.2            BBmisc_1.6
>>            BiocParallel_0.7.2
>> 
>>               [4] Biostrings_2.33.10       bitops_1.0-6
>>            brew_1.0-6
>> 
>>               [7] codetools_0.2-8          DBI_0.2-7
>>              digest_0.6.4
>> 
>>            [10] fail_1.2                 foreach_1.4.2
>>            GenomicAlignments_1.1.14
>>            [13] iterators_1.0.7          plyr_1.8.1
>>            Rcpp_0.11.2
>> 
>>            [16] RCurl_1.95-4.1           Rsamtools_1.17.25
>>              RSQLite_0.11.4
>> 
>>            [19] sendmailR_1.1-2          stats4_3.1.0
>>            stringr_0.6.2
>> 
>>            [22] tools_3.1.0              XML_3.98-1.1
>>            XVector_0.5.6
>> 
>>            [25] zlibbioc_1.11.1
>> 
>>            On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>>            <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>>            wrote:
>> 
>>            I think I might have already added this a couple days ago to
>>            devel...
>>            tough to keep it all straight in my head
>> 
>> 
>>            On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>>            <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>> 
>>                Hi again,
>> 
>>                This is somewhat related to another request I sent
>>                earlier this
>>                afternoon. Is it possible to implement query-self
>>                comparisons (i.e. where I
>>                specify query but not subject) for GRangesList objects?
>>                   The motivation is
>>                that I'd like to use the ignoreSelf and ignoreRedundant
>>                options in a
>>                GRangesList comparison.
>> 
>>                I mentioned in my other request that I'm looking through
>>                a set of genes
>>                to find pairs that overlap on opposite strands.  I'm now
>>                using findOverlaps
>>                on a GRangesList object instead of a GRanges object -
>>                that's because I want
>>                to only look at cases where parts of the final spliced
>>                transcript overlap,
>>                not cases where a large intron-containing gene has
>>                another smaller gene
>>                nested in an intron.  When I used findOverlaps on the
>>                entire genes at once
>>                as GRanges objects, my results included pairs of that
>>                nested type, but if I
>>                use the blocks function to get just the exons as a
>>                GRangesList object, that
>>                lets me successfully ignore the nested gene case, which
>>                is great.  However,
>>                with GRangesList I can't use the query-self comparison
>>                and therefore can't
>>                access those useful ignoreSelf and ignoreRedundant
>>                options.  I know I can
>>                workaround that too with some effort, but it'd be great
>>                to have it as part
>>                of the underlying code.
>> 
>>                Again, I've included some code below that should show
>>                what I'm trying to
>>                do.
>> 
>>                all the best,
>> 
>>                Janet
>> 
>> 
>>                library(rtracklayer)
>> 
>>                #### get some drosophila genes as a test case:
>>                mySession <- browserSession()
>>                genome(mySession) <- "dm3"
>>                genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>                table="flyBaseGene")
>>                genes <- track(genes)
>> 
>>                #### reduce to a smaller example that contains a gene
>>                pair of the type
>>                I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>                genes <- genes[148:152]
>> 
>>                #### remove strand info, as a workaround to not being
>>                able to specify
>>                ignore.strand for a query-self findOverlaps call
>>                strand(genes) <- "*"
>> 
>>                #### using findOverlaps on the genes themselves shows me
>>                the nested pair
>>                (query=3, subject=4)
>>                findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>>                = TRUE)
>> 
>>                #### so I'll use blocks to extract only the exonic
>>                portions of the genes
>>                as a GRangesList:
>>                genes_GRL <- blocks(genes)
>> 
>>                #### and use findOverlaps on that GRangesList object,
>>                first by specifying
>>                it as both query and subject in the comparison - this
>>                gives me more or less
>>                what I want (i.e. it does NOT show the nested pair 3-4),
>>                except that
>>                there's a bunch of filtering to do later.
>>                findOverlaps( genes_GRL, genes_GRL)
>> 
>>                #### ideally, to help me filter the hits I'd like to be
>>                able to use
>>                ignoreSelf and ignoreRedundant, but I can only use those
>>                if it's a
>>                query-self comparison (i.e. only works if no subject is
>>                specified)
>>                findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>>                ignoreRedundant=TRUE)
>>                # Error in .local(query, subject, maxgap, minoverlap,
>>                type, select, ...) :
>>                #   unused arguments (ignoreSelf = TRUE, ignoreRedundant
>>                = TRUE)
>> 
>>                #### and it looks like findOverlaps is not implemented
>>                for the query-self
>>                case for GRangesList objects
>>                findOverlaps( genes_GRL)
>>                # Error in match.arg(type) : 'arg' must be of length 1
>> 
>>                sessionInfo()
>> 
>>                R version 3.1.0 Patched (2014-05-26 r65771)
>>                Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>>                locale:
>>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>                [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>>                attached base packages:
>>                [1] parallel  stats     graphics  grDevices utils
>>                datasets  methods
>>                [8] base
>> 
>>                other attached packages:
>>                [1] rtracklayer_1.25.11   GenomicRanges_1.17.17
>>                GenomeInfoDb_1.1.6
>>                [4] IRanges_1.99.15       S4Vectors_0.0.8
>>                BiocGenerics_0.11.2
>> 
>>                loaded via a namespace (and not attached):
>>                   [1] BatchJobs_1.2            BBmisc_1.6
>>                BiocParallel_0.7.2
>>                   [4] Biostrings_2.33.10       bitops_1.0-6
>>                brew_1.0-6
>>                   [7] codetools_0.2-8          DBI_0.2-7
>>                  digest_0.6.4
>>                [10] fail_1.2                 foreach_1.4.2
>>                   GenomicAlignments_1.1.13
>>                [13] iterators_1.0.7          plyr_1.8.1
>>                Rcpp_0.11.2
>>                [16] RCurl_1.95-4.1           Rsamtools_1.17.23
>>                  RSQLite_0.11.4
>>                [19] sendmailR_1.1-2          stats4_3.1.0
>>                stringr_0.6.2
>>                [22] tools_3.1.0              XML_3.98-1.1
>>                XVector_0.5.6
>>                [25] zlibbioc_1.11.1
>> 
>>                _________________________________________________
>>                Bioconductor mailing list
>>                Bioconductor at r-project.org
>>                <mailto:Bioconductor at r-project.org>
>>                https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>                <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>                Search the archives:
>>                http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>                <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>> 
>> 
>> 
>> 
>> 
>>                 [[alternative HTML version deleted]]
>> 
>> 
>>        _________________________________________________
>>        Bioconductor mailing list
>>        Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>        https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>        <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>        Search the archives:
>>        http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>        <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>> 
>> 
>> 
> 



More information about the Bioconductor mailing list