[BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named

Cook, Malcolm MEC at stowers.org
Fri Dec 14 16:45:36 CET 2012


Herve,

Excellent news!  I look forward to seeing your contrib on R-devel wend its way world-wide.

Beaucoups woohoos and mucho kudos to you,

;)

~Malcolm


 .-----Original Message-----
 .From: Hervé Pagès [mailto:hpages at fhcrc.org]
 .Sent: Thursday, December 13, 2012 6:15 PM
 .To: Cook, Malcolm
 .Cc: 'Michael Lawrence'; 'bioconductor at r-project.org'
 .Subject: Re: [BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named
 .
 .On 12/13/2012 12:24 PM, Cook, Malcolm wrote:
 .> Thanks for digging into this, Herve, Michael.
 .>
 .> Herve, I really appreciate your following up on R-devel, such as you
 .> recently did that got mapply 'fixed' to work natively with Bioc's List
 .> and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS)
 .>
 .> I don't think re-defining c as a generic in BioConductor is a good
 .> workaround, for the reasons you mentioned Herve.  The issue will just
 .> crop up again with someone else's non BioC S4 class structure.
 .>
 .> It really is not a BioConductor issue at all.
 .>
 .> If this can also be kicked upstream, that would serve others as well.
 .
 .Glups, I wrote a long answer about the pros and cons of putting stuff
 .in BiocGenerics vs trying to push it into mainstream R. I was about to
 .press the Send button but, before doing so, decided to have a quick
 .look at the source of the methods package (following Michael suggestion)
 .just to confirm my feeling that this would be a tough one, so tough
 .that my previous workaround would suddenly sound much more appealing.
 .I was psychologically and emotionally prepared to have a rough time,
 .but, surprisingly, I didn't. Here is the patch:
 .
 .hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$ svn diff
 .Index: src/library/methods/R/BasicFunsList.R
 .===================================================================
 .--- src/library/methods/R/BasicFunsList.R     (revision 61310)
 .+++ src/library/methods/R/BasicFunsList.R     (working copy)
 .@@ -46,7 +46,7 @@
 .  , "%*%" = function(x, y) standardGeneric("%*%")
 .  , "xtfrm" = function(x) standardGeneric("xtfrm")
 .  ### these have a different arglist from the primitives
 .-, "c" = function(x, ..., recursive = FALSE) standardGeneric("c")
 .+, "c" = function(..., recursive = FALSE) standardGeneric("c")
 .  , "all" = function(x, ..., na.rm = FALSE) standardGeneric("all")
 .  , "any" = function(x, ..., na.rm = FALSE) standardGeneric("any")
 .  , "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum")
 .
 .Yes, a 1-liner! I did very little testing but it seems to work fine :-)
 .
 .I'll do more testing before I send this to R-devel. Thanks for the
 .encouragements.
 .
 .H.
 .
 .>
 .> Thoughts?
 .>
 .> ~Malcolm
 .>
 .> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
 .> *Sent:* Thursday, December 13, 2012 11:13 AM
 .> *To:* Hervé Pagès
 .> *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org
 .> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
 .> IRangesList returns "list" only when the list is named
 .>
 .> Probably better to bring this issue to the attention of John Chambers.
 .> Since he's invited us to start hacking on the methods package, this
 .> might be a good opportunity smooth out some of these rough edges.
 .>
 .>
 .> Michael
 .>
 .> On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at fhcrc.org
 .> <mailto:hpages at fhcrc.org>> wrote:
 .>
 .> Hi Malcolm,
 .>
 .> I'm not sure what the reasons are for the current behaviour
 .> of the c() generic, if they're just historical, or if there
 .> is something deeper, or...
 .>
 .> My view on the "primitive" status of a function is that it should
 .> be an implementation detail, maybe an important one, but a
 .> detail anyway in the sense that being implemented as a .Primitive
 .> or an .Internal or just in plain R should not affect the semantic
 .> of a function. Interestingly there is a short comment in ?.Primitive
 .> suggesting that people's code should not depend on knowing which
 .> functions are primitive because this does change as R evolves.
 .> Unfortunately the reality is very different: there are situations
 .> where you definitely need to know that something is a primitive,
 .> just because argument passing (and consequently method dispatch)
 .> works differently.
 .>
 .> On a more positive note, I found a hack that allows c() to dispatch
 .> on ...:
 .>
 .>    setGeneric("c", signature="...",
 .>      function(..., recursive=FALSE)
 .>          standardGeneric("c"),
 .>      useAsDefault=function(..., recursive=FALSE)
 .>                       base::c(..., recursive=recursive)
 .>    )
 .>
 .> Then:
 .>
 .>    setClass("A", representation(aa="integer"))
 .>
 .>    setMethod("c", "A",
 .>      function(..., recursive=FALSE)
 .>      {
 .>          args <- list(...)
 .>          ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE)
 .>          new("A", aa=ans_aa)
 .>      }
 .>    )
 .>
 .>    > a1 <- new("A", aa=1:3)
 .>    > a2 <- new("A", aa=22:25)
 .>
 .>    > c(a1, a2)
 .>    An object of class "A"
 .>    Slot "aa":
 .>    [1]  1  2  3 22 23 24 25
 .>
 .>    > c(a1, x=a2)
 .>    An object of class "A"
 .>    Slot "aa":
 .>    [1]  1  2  3 22 23 24 25
 .>
 .>    > c(A=a1, B=a2)
 .>    An object of class "A"
 .>    Slot "aa":
 .>    [1]  1  2  3 22 23 24 25
 .>
 .> Overriding base::c() with our own c() is pretty invasive though and
 .> I didn't test it enough to guarantee that it doesn't break or slowdown
 .> things.
 .>
 .> Also one important thing to note is that this signature doesn't
 .> allow specific methods to implement extra arguments (like the "c"
 .> method for GenomicRanges does), which kind of makes sense because
 .> the generic function is putting named args that are not named
 .> 'recursive' in ..., and dispatches on them. The same restriction
 .> applies to the cbind() and rbind() generics:
 .>
 .>    > setMethod("cbind", "A", function(..., deparse.level=1,
 .> my.toggle=FALSE) NULL)
 .>    Creating a generic function for 'cbind' from package 'base' in the
 .> global environment
 .>    in method for 'cbind' with signature '"A"': no definition for class "A"
 .>    Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
 .>      arguments (deparse.level) after '...' in the generic must appear in
 .> the method, in the same place at the end of the argument list
 .>
 .> So some of the "c" methods would need to be revisited.
 .>
 .> Anyway, would need serious testing before adding this generic to
 .> BiocGenerics. Is it worth it?
 .>
 .> Cheers,
 .> H.
 .>
 .>
 .>
 .>
 .> On 12/03/2012 12:11 PM, Cook, Malcolm wrote:
 .>
 .>     Steve, Michael, Herve, all
 .>
 .>     As always, "illuminating".
 .>
 .>     And, as often, frustrating.
 .>
 .>     I am clear how unname serves as a workaround for my current purpose.
 .>     So, I can proceed.
 .>
 .>     But, I remain unclear if this (to me, odd) behavior of `base::c` is
 .>     desirable or justifiable in any sense of the word.  Is this informed by
 .>     a rational language design, or, as Mike suggests, the result of layering
 .>     on of OO design onto a functional base.
 .>
 .>     In your opinion, do you/we think this issue should this issue be raised
 .>     on R-devel?  Or is it a "waste of time"?
 .>
 .>     Thanks for your thoughts/help.
 .>
 .>     ~Malcolm
 .>
 .>     *From:*Michael Lawrence [mailto:lawrence.michael at gene.com
 .>     <mailto:lawrence.michael at gene.com>]
 .>     *Sent:* Monday, December 03, 2012 11:31 AM
 .>     *To:* Hervé Pagès
 .>     *Cc:* Cook, Malcolm; bioconductor at r-project.org
 .>     <mailto:bioconductor at r-project.org>
 .>     *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
 .>
 .>
 .>     IRangesList returns "list" only when the list is named
 .>
 .>     On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at fhcrc.org
 .>     <mailto:hpages at fhcrc.org>
 .>
 .>     <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
 .>
 .>     Hi Malcolm,
 .>
 .>     The problem you are describing can be reproduced by calling c()
 .>     directly on S4 objects.
 .>
 .>         * With unnamed arguments:
 .>
 .>           > c(IRanges(), IRanges())
 .>           IRanges of length 0
 .>
 .>           > c(Rle(), Rle())
 .>           logical-Rle of length 0 with 0 runs
 .>             Lengths:
 .>             Values :
 .>
 .>         * With named arguments:
 .>
 .>           > c(a=IRanges(),b=IRanges())
 .>           $a
 .>           IRanges of length 0
 .>
 .>           $b
 .>           IRanges of length 0
 .>
 .>           > c(a=Rle(), b=Rle())
 .>           $a
 .>           logical-Rle of length 0 with 0 runs
 .>             Lengths:
 .>             Values :
 .>
 .>           $b
 .>           logical-Rle of length 0 with 0 runs
 .>             Lengths:
 .>             Values :
 .>
 .>     This statement (found in man page for base::c()) is showing what the
 .>     root of the problem is:
 .>
 .>         S4 methods:
 .>
 .>            This function is S4 generic, but with argument list '(x, ...,
 .>            recursive = FALSE)'.
 .>
 .>     Note that, to make things a little bit more confusing, it's not totally
 .>     accurate that c() is an S4 generic, at least not on a fresh session:
 .>
 .>         > isGeneric("c")
 .>         [1] FALSE
 .>
 .>     So my understanding of the above statement is that c() will
 .>     automatically be turned into an S4 generic at the moment you try
 .>     to define an S4 method for it, and, for obscure reasons that I'm not
 .>     sure I understand, the argument list used in the definition of this
 .>     S4 method must start with 'x'. The consequence of all this is that
 .>     dispatch will happen on 'x' so if named arguments are passed with
 .>     a name that is not 'x', dispatch will fail and the default method
 .>     (which is base::c()) will be called :-b
 .>
 .>     This explains why things work as expected in the following situations:
 .>
 .>         > c(IRanges(), b=IRanges())
 .>         IRanges of length 0
 .>
 .>         > c(a=IRanges(), IRanges())
 .>         IRanges of length 0
 .>
 .>         > c(a=IRanges(), x=IRanges())
 .>         IRanges of length 0
 .>
 .>     But when all the arguments are named with names != 'x', then nothing
 .>     is passed to 'x' and dispatch fails.
 .>
 .>     I didn't have much luck so far with my attempts to work around this:
 .>
 .>         1. Trying to change the signature of the c() generic:
 .>
 .>            > setGeneric("c", signature="...")
 .>            Error in setGeneric("c", signature = "...") :
 .>              'c' is a primitive function;  methods can be defined, but
 .>             the generic function is implicit, and cannot be changed.
 .>
 .>         2. Trying to dispatch on "missing" or "ANY":
 .>
 .>            > setMethod("c", "missing", function(x, ..., recursive=FALSE)
 .>     "YES!")
 .>            Error in setMethod("c", "missing", function(x, ..., recursive =
 .>     FALSE) "YES!") :
 .>              the method for function 'c' and signature x="missing" is sealed
 .>     and cannot be re-defined
 .>
 .>            > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!")
 .>     Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE)
 .>     "YES!") :
 .>              the method for function 'c' and signature x="ANY" is sealed and
 .>     cannot be re-defined
 .>
 .>     With old versions of R dispatch on ... was not possible i.e. ... was not
 .>     allowed to be in the signature of the generic. This was changed in
 .>     recent versions of R and we're already using this new feature for a
 .>     few S4 generics defined in BiocGenerics e.g. for cbind() and rbind():
 .>
 .>         > library(BiocGenerics)
 .>         > rbind
 .>         standardGeneric for "rbind" defined from package "BiocGenerics"
 .>
 .>         function (..., deparse.level = 1)
 .>         standardGeneric("rbind")
 .>         <environment: 0x29b96b0>
 .>         Methods may be defined for arguments: ...
 .>         Use  showMethods("rbind")  for currently available ones.
 .>
 .>     And dispatch works as expected, with or without named arguments:
 .>
 .>         > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23))
 .>         DataFrame with 6 rows and 2 columns
 .>                   X         Y
 .>           <integer> <integer>
 .>         1         1        11
 .>         2         2        12
 .>         3         3        13
 .>         4         1        21
 .>         5         2        22
 .>         6         3        23
 .>
 .>         > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23))
 .>         DataFrame with 6 rows and 2 columns
 .>                   X         Y
 .>           <integer> <integer>
 .>         1         1        11
 .>         2         2        12
 .>         3         3        13
 .>         4         1        21
 .>         5         2        22
 .>         6         3        23
 .>
 .>     So I wonder if the weird behavior of c() is still justified.
 .>
 .>     Comments/suggestions to address this are welcome.
 .>
 .>
 .>
 .>     The issue is that (unlike 'rbind')  'c' is a primitive and dispatch for
 .>     primitives is hard-coded in C. C-level dispatch is a simplified variant
 .>     of the R implementation, so I'm guessing it does not work with "...".
 .>
 .>     Btw, you can get a peak at the 'c' generic with:
 .>       > getGeneric("c")
 .>     standardGeneric for "c" defined from package "base"
 .>
 .>     function (x, ..., recursive = FALSE)
 .>     standardGeneric("c", .Primitive("c"))
 .>     <bytecode: 0x382af20>
 .>     <environment: 0x34d6878>
 .>     Methods may be defined for arguments: x, recursive
 .>     Use  showMethods("c")  for currently available ones.
 .>
 .>     Michael
 .>
 .>          Thanks,
 .>          H.
 .>
 .>
 .>
 .>
 .>          On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
 .>
 .>          Hi,
 .>
 .>          The following shows that do.call of `c` on a list of IRangesList
 .>          returns "list" only when the list is named.
 .>
 .>          library(IRanges)
 .>          example(IRangesList)
 .>          class(x)
 .>
 .>          [1] "CompressedIRangesList"
 .>          attr(,"package")
 .>          [1] "IRanges"
 .>
 .>          class(do.call(c,list(x1=x,x2=x)))
 .>
 .>          [1] "list"
 .>
 .>          I am confused this.
 .>
 .>          I would not expect the fact that the list is named to have any
 .>          impact on the result.
 .>
 .>          But, look, omitting the list names the class is now an IRangesList
 .>
 .>          class(do.call(c,list(x,x)))
 .>
 .>          [1] "CompressedIRangesList"
 .>          attr(,"package")
 .>          [1] "IRanges"
 .>
 .>          class(c(x,x))
 .>
 .>          [1] "CompressedIRangesList"
 .>          attr(,"package")
 .>          [1] "IRanges"
 .>
 .>          A 'workaround' is to unname the list, as demonstrated:
 .>
 .>          class(do.call(c,unname(list(x1=x,x2=x))))
 .>
 .>          [1] "CompressedIRangesList"
 .>          attr(,"package")
 .>          [1] "IRanges"
 .>
 .>          But, why does having a 'names' attribute effect the behavior of
 .>          do.calling `c` so much as to change the class returned?
 .>
 .>
 .>          Thanks for your help/education.....
 .>
 .>          Malcolm Cook
 .>          Computational Biology - Stowers Institute for Medical Research
 .>
 .>          sessionInfo()
 .>
 .>          R version 2.15.1 (2012-06-22)
 .>          Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 .>
 .>          locale:
 .>          [1] C
 .>
 .>          attached base packages:
 .>          [1] stats     graphics  grDevices utils     datasets  methods
 .>     base
 .>
 .>          other attached packages:
 .>          [1] IRanges_1.16.4     BiocGenerics_0.4.0
 .>
 .>          loaded via a namespace (and not attached):
 .>             [1] AnnotationDbi_1.20.3   BSgenome_1.26.1        Biobase_2.18.0
 .>                   Biostrings_2.26.2      DBI_0.2-5
 .>            GenomicFeatures_1.10.1 GenomicRanges_1.10.5   RCurl_1.95-3
 .>             RSQLite_0.11.2         Rsamtools_1.10.2       XML_3.95-0.1
 .>               biomaRt_2.14.0         bitops_1.0-4.2         colorspace_1.2-0
 .>                 data.table_1.8.6       functional_0.1         graph_1.36.1
 .>                   gtools_2.7.0           parallel_2.15.1
 .>            rtracklayer_1.18.1     stats4_2.15.1          tools_2.15.1
 .>             zlibbioc_1.4.0
 .>
 .>
 .>          _______________________________________________
 .>          Bioconductor mailing list
 .>
 .>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
 .>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
 .>
 .>
 .>     https://stat.ethz.ch/mailman/listinfo/bioconductor
 .>          Search the archives:
 .>     http://news.gmane.org/gmane.science.biology.informatics.conductor
 .>
 .>          --
 .>          Hervé Pagès
 .>
 .>          Program in Computational Biology
 .>          Division of Public Health Sciences
 .>          Fred Hutchinson Cancer Research Center
 .>          1100 Fairview Ave. N, M1-B514
 .>          P.O. Box 19024
 .>          Seattle, WA 98109-1024
 .>
 .>          E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
 .>     <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
 .>          Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
 .>     <tel:%28206%29%20667-5791>
 .>          Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
 .>     <tel:%28206%29%20667-1319>
 .>
 .>
 .>
 .>          _______________________________________________
 .>          Bioconductor mailing list
 .>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
 .>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
 .>
 .>
 .>     https://stat.ethz.ch/mailman/listinfo/bioconductor
 .>          Search the archives:
 .>     http://news.gmane.org/gmane.science.biology.informatics.conductor
 .>
 .>
 .> --
 .> Hervé Pagès
 .>
 .> Program in Computational Biology
 .> Division of Public Health Sciences
 .> Fred Hutchinson Cancer Research Center
 .> 1100 Fairview Ave. N, M1-B514
 .> P.O. Box 19024
 .> Seattle, WA 98109-1024
 .>
 .> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
 .> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
 .> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
 .>
 .
 .--
 .Hervé Pagès
 .
 .Program in Computational Biology
 .Division of Public Health Sciences
 .Fred Hutchinson Cancer Research Center
 .1100 Fairview Ave. N, M1-B514
 .P.O. Box 19024
 .Seattle, WA 98109-1024
 .
 .E-mail: hpages at fhcrc.org
 .Phone:  (206) 667-5791
 .Fax:    (206) 667-1319



More information about the Bioconductor mailing list