[BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named

Hervé Pagès hpages at fhcrc.org
Fri Dec 14 01:15:01 CET 2012


On 12/13/2012 12:24 PM, Cook, Malcolm wrote:
> Thanks for digging into this, Herve, Michael.
>
> Herve, I really appreciate your following up on R-devel, such as you
> recently did that got mapply ‘fixed’ to work natively with Bioc’s List
> and friends (c.f. http://developer.r-project.org/blosxom.cgi/R-devel/NEWS)
>
> I don’t think re-defining c as a generic in BioConductor is a good
> workaround, for the reasons you mentioned Herve.  The issue will just
> crop up again with someone else’s non BioC S4 class structure.
>
> It really is not a BioConductor issue at all.
>
> If this can also be kicked upstream, that would serve others as well.

Glups, I wrote a long answer about the pros and cons of putting stuff
in BiocGenerics vs trying to push it into mainstream R. I was about to
press the Send button but, before doing so, decided to have a quick
look at the source of the methods package (following Michael suggestion)
just to confirm my feeling that this would be a tough one, so tough
that my previous workaround would suddenly sound much more appealing.
I was psychologically and emotionally prepared to have a rough time,
but, surprisingly, I didn't. Here is the patch:

hpages at thinkpad:~/biocprojects/c_implicit_generic/R-devel$ svn diff
Index: src/library/methods/R/BasicFunsList.R
===================================================================
--- src/library/methods/R/BasicFunsList.R	(revision 61310)
+++ src/library/methods/R/BasicFunsList.R	(working copy)
@@ -46,7 +46,7 @@
  , "%*%" = function(x, y) standardGeneric("%*%")
  , "xtfrm" = function(x) standardGeneric("xtfrm")
  ### these have a different arglist from the primitives
-, "c" = function(x, ..., recursive = FALSE) standardGeneric("c")
+, "c" = function(..., recursive = FALSE) standardGeneric("c")
  , "all" = function(x, ..., na.rm = FALSE) standardGeneric("all")
  , "any" = function(x, ..., na.rm = FALSE) standardGeneric("any")
  , "sum" = function(x, ..., na.rm = FALSE) standardGeneric("sum")

Yes, a 1-liner! I did very little testing but it seems to work fine :-)

I'll do more testing before I send this to R-devel. Thanks for the
encouragements.

H.

>
> Thoughts?
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> *Sent:* Thursday, December 13, 2012 11:13 AM
> *To:* Hervé Pagès
> *Cc:* Cook, Malcolm; Michael Lawrence; bioconductor at r-project.org
> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
> IRangesList returns "list" only when the list is named
>
> Probably better to bring this issue to the attention of John Chambers.
> Since he's invited us to start hacking on the methods package, this
> might be a good opportunity smooth out some of these rough edges.
>
>
> Michael
>
> On Wed, Dec 12, 2012 at 6:46 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Malcolm,
>
> I'm not sure what the reasons are for the current behaviour
> of the c() generic, if they're just historical, or if there
> is something deeper, or...
>
> My view on the "primitive" status of a function is that it should
> be an implementation detail, maybe an important one, but a
> detail anyway in the sense that being implemented as a .Primitive
> or an .Internal or just in plain R should not affect the semantic
> of a function. Interestingly there is a short comment in ?.Primitive
> suggesting that people's code should not depend on knowing which
> functions are primitive because this does change as R evolves.
> Unfortunately the reality is very different: there are situations
> where you definitely need to know that something is a primitive,
> just because argument passing (and consequently method dispatch)
> works differently.
>
> On a more positive note, I found a hack that allows c() to dispatch
> on ...:
>
>    setGeneric("c", signature="...",
>      function(..., recursive=FALSE)
>          standardGeneric("c"),
>      useAsDefault=function(..., recursive=FALSE)
>                       base::c(..., recursive=recursive)
>    )
>
> Then:
>
>    setClass("A", representation(aa="integer"))
>
>    setMethod("c", "A",
>      function(..., recursive=FALSE)
>      {
>          args <- list(...)
>          ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE)
>          new("A", aa=ans_aa)
>      }
>    )
>
>    > a1 <- new("A", aa=1:3)
>    > a2 <- new("A", aa=22:25)
>
>    > c(a1, a2)
>    An object of class "A"
>    Slot "aa":
>    [1]  1  2  3 22 23 24 25
>
>    > c(a1, x=a2)
>    An object of class "A"
>    Slot "aa":
>    [1]  1  2  3 22 23 24 25
>
>    > c(A=a1, B=a2)
>    An object of class "A"
>    Slot "aa":
>    [1]  1  2  3 22 23 24 25
>
> Overriding base::c() with our own c() is pretty invasive though and
> I didn't test it enough to guarantee that it doesn't break or slowdown
> things.
>
> Also one important thing to note is that this signature doesn't
> allow specific methods to implement extra arguments (like the "c"
> method for GenomicRanges does), which kind of makes sense because
> the generic function is putting named args that are not named
> 'recursive' in ..., and dispatches on them. The same restriction
> applies to the cbind() and rbind() generics:
>
>    > setMethod("cbind", "A", function(..., deparse.level=1,
> my.toggle=FALSE) NULL)
>    Creating a generic function for ‘cbind’ from package ‘base’ in the
> global environment
>    in method for ‘cbind’ with signature ‘"A"’: no definition for class “A”
>    Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
>      arguments (deparse.level) after '...' in the generic must appear in
> the method, in the same place at the end of the argument list
>
> So some of the "c" methods would need to be revisited.
>
> Anyway, would need serious testing before adding this generic to
> BiocGenerics. Is it worth it?
>
> Cheers,
> H.
>
>
>
>
> On 12/03/2012 12:11 PM, Cook, Malcolm wrote:
>
>     Steve, Michael, Herve, all
>
>     As always, “illuminating”.
>
>     And, as often, frustrating.
>
>     I am clear how unname serves as a workaround for my current purpose.
>     So, I can proceed.
>
>     But, I remain unclear if this (to me, odd) behavior of `base::c` is
>     desirable or justifiable in any sense of the word.  Is this informed by
>     a rational language design, or, as Mike suggests, the result of layering
>     on of OO design onto a functional base.
>
>     In your opinion, do you/we think this issue should this issue be raised
>     on R-devel?  Or is it a “waste of time”?
>
>     Thanks for your thoughts/help.
>
>     ~Malcolm
>
>     *From:*Michael Lawrence [mailto:lawrence.michael at gene.com
>     <mailto:lawrence.michael at gene.com>]
>     *Sent:* Monday, December 03, 2012 11:31 AM
>     *To:* Hervé Pagès
>     *Cc:* Cook, Malcolm; bioconductor at r-project.org
>     <mailto:bioconductor at r-project.org>
>     *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
>
>
>     IRangesList returns "list" only when the list is named
>
>     On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at fhcrc.org
>     <mailto:hpages at fhcrc.org>
>
>     <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>     Hi Malcolm,
>
>     The problem you are describing can be reproduced by calling c()
>     directly on S4 objects.
>
>         * With unnamed arguments:
>
>           > c(IRanges(), IRanges())
>           IRanges of length 0
>
>           > c(Rle(), Rle())
>           logical-Rle of length 0 with 0 runs
>             Lengths:
>             Values :
>
>         * With named arguments:
>
>           > c(a=IRanges(),b=IRanges())
>           $a
>           IRanges of length 0
>
>           $b
>           IRanges of length 0
>
>           > c(a=Rle(), b=Rle())
>           $a
>           logical-Rle of length 0 with 0 runs
>             Lengths:
>             Values :
>
>           $b
>           logical-Rle of length 0 with 0 runs
>             Lengths:
>             Values :
>
>     This statement (found in man page for base::c()) is showing what the
>     root of the problem is:
>
>         S4 methods:
>
>            This function is S4 generic, but with argument list ‘(x, ...,
>            recursive = FALSE)’.
>
>     Note that, to make things a little bit more confusing, it's not totally
>     accurate that c() is an S4 generic, at least not on a fresh session:
>
>         > isGeneric("c")
>         [1] FALSE
>
>     So my understanding of the above statement is that c() will
>     automatically be turned into an S4 generic at the moment you try
>     to define an S4 method for it, and, for obscure reasons that I'm not
>     sure I understand, the argument list used in the definition of this
>     S4 method must start with 'x'. The consequence of all this is that
>     dispatch will happen on 'x' so if named arguments are passed with
>     a name that is not 'x', dispatch will fail and the default method
>     (which is base::c()) will be called :-b
>
>     This explains why things work as expected in the following situations:
>
>         > c(IRanges(), b=IRanges())
>         IRanges of length 0
>
>         > c(a=IRanges(), IRanges())
>         IRanges of length 0
>
>         > c(a=IRanges(), x=IRanges())
>         IRanges of length 0
>
>     But when all the arguments are named with names != 'x', then nothing
>     is passed to 'x' and dispatch fails.
>
>     I didn't have much luck so far with my attempts to work around this:
>
>         1. Trying to change the signature of the c() generic:
>
>            > setGeneric("c", signature="...")
>            Error in setGeneric("c", signature = "...") :
>              ‘c’ is a primitive function;  methods can be defined, but
>             the generic function is implicit, and cannot be changed.
>
>         2. Trying to dispatch on "missing" or "ANY":
>
>            > setMethod("c", "missing", function(x, ..., recursive=FALSE)
>     "YES!")
>            Error in setMethod("c", "missing", function(x, ..., recursive =
>     FALSE) "YES!") :
>              the method for function ‘c’ and signature x="missing" is sealed
>     and cannot be re-defined
>
>            > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!")
>     Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE)
>     "YES!") :
>              the method for function ‘c’ and signature x="ANY" is sealed and
>     cannot be re-defined
>
>     With old versions of R dispatch on ... was not possible i.e. ... was not
>     allowed to be in the signature of the generic. This was changed in
>     recent versions of R and we're already using this new feature for a
>     few S4 generics defined in BiocGenerics e.g. for cbind() and rbind():
>
>         > library(BiocGenerics)
>         > rbind
>         standardGeneric for "rbind" defined from package "BiocGenerics"
>
>         function (..., deparse.level = 1)
>         standardGeneric("rbind")
>         <environment: 0x29b96b0>
>         Methods may be defined for arguments: ...
>         Use  showMethods("rbind")  for currently available ones.
>
>     And dispatch works as expected, with or without named arguments:
>
>         > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23))
>         DataFrame with 6 rows and 2 columns
>                   X         Y
>           <integer> <integer>
>         1         1        11
>         2         2        12
>         3         3        13
>         4         1        21
>         5         2        22
>         6         3        23
>
>         > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23))
>         DataFrame with 6 rows and 2 columns
>                   X         Y
>           <integer> <integer>
>         1         1        11
>         2         2        12
>         3         3        13
>         4         1        21
>         5         2        22
>         6         3        23
>
>     So I wonder if the weird behavior of c() is still justified.
>
>     Comments/suggestions to address this are welcome.
>
>
>
>     The issue is that (unlike 'rbind')  'c' is a primitive and dispatch for
>     primitives is hard-coded in C. C-level dispatch is a simplified variant
>     of the R implementation, so I'm guessing it does not work with "...".
>
>     Btw, you can get a peak at the 'c' generic with:
>       > getGeneric("c")
>     standardGeneric for "c" defined from package "base"
>
>     function (x, ..., recursive = FALSE)
>     standardGeneric("c", .Primitive("c"))
>     <bytecode: 0x382af20>
>     <environment: 0x34d6878>
>     Methods may be defined for arguments: x, recursive
>     Use  showMethods("c")  for currently available ones.
>
>     Michael
>
>          Thanks,
>          H.
>
>
>
>
>          On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
>
>          Hi,
>
>          The following shows that do.call of `c` on a list of IRangesList
>          returns "list" only when the list is named.
>
>          library(IRanges)
>          example(IRangesList)
>          class(x)
>
>          [1] "CompressedIRangesList"
>          attr(,"package")
>          [1] "IRanges"
>
>          class(do.call(c,list(x1=x,x2=x)))
>
>          [1] "list"
>
>          I am confused this.
>
>          I would not expect the fact that the list is named to have any
>          impact on the result.
>
>          But, look, omitting the list names the class is now an IRangesList
>
>          class(do.call(c,list(x,x)))
>
>          [1] "CompressedIRangesList"
>          attr(,"package")
>          [1] "IRanges"
>
>          class(c(x,x))
>
>          [1] "CompressedIRangesList"
>          attr(,"package")
>          [1] "IRanges"
>
>          A 'workaround' is to unname the list, as demonstrated:
>
>          class(do.call(c,unname(list(x1=x,x2=x))))
>
>          [1] "CompressedIRangesList"
>          attr(,"package")
>          [1] "IRanges"
>
>          But, why does having a 'names' attribute effect the behavior of
>          do.calling `c` so much as to change the class returned?
>
>
>          Thanks for your help/education.....
>
>          Malcolm Cook
>          Computational Biology - Stowers Institute for Medical Research
>
>          sessionInfo()
>
>          R version 2.15.1 (2012-06-22)
>          Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>          locale:
>          [1] C
>
>          attached base packages:
>          [1] stats     graphics  grDevices utils     datasets  methods
>     base
>
>          other attached packages:
>          [1] IRanges_1.16.4     BiocGenerics_0.4.0
>
>          loaded via a namespace (and not attached):
>             [1] AnnotationDbi_1.20.3   BSgenome_1.26.1        Biobase_2.18.0
>                   Biostrings_2.26.2      DBI_0.2-5
>            GenomicFeatures_1.10.1 GenomicRanges_1.10.5   RCurl_1.95-3
>             RSQLite_0.11.2         Rsamtools_1.10.2       XML_3.95-0.1
>               biomaRt_2.14.0         bitops_1.0-4.2         colorspace_1.2-0
>                 data.table_1.8.6       functional_0.1         graph_1.36.1
>                   gtools_2.7.0           parallel_2.15.1
>            rtracklayer_1.18.1     stats4_2.15.1          tools_2.15.1
>             zlibbioc_1.4.0
>
>
>          _______________________________________________
>          Bioconductor mailing list
>
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>          Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>          --
>          Hervé Pagès
>
>          Program in Computational Biology
>          Division of Public Health Sciences
>          Fred Hutchinson Cancer Research Center
>          1100 Fairview Ave. N, M1-B514
>          P.O. Box 19024
>          Seattle, WA 98109-1024
>
>          E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>          Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     <tel:%28206%29%20667-5791>
>          Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>     <tel:%28206%29%20667-1319>
>
>
>
>          _______________________________________________
>          Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>
>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>          Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list