[Rd] Why is the diag function so slow (for extraction)?

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Wed May 13 18:20:43 CEST 2015


As kindly pointed out to me (oh my decaying gray matter), is.object()
is better suited for this test;

$ svn diff src/library/base/R/diag.R
Index: src/library/base/R/diag.R
===================================================================
--- src/library/base/R/diag.R   (revision 68345)
+++ src/library/base/R/diag.R   (working copy)
@@ -23,9 +23,11 @@
             stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")

         if((m <- min(dim(x))) == 0L) return(vector(typeof(x), 0L))
+        nms <- dimnames(x)
+        nrow <- dim(x)[1L]
         ## NB: need double index to avoid overflows.
-        y <- c(x)[1 + 0L:(m - 1L) * (dim(x)[1L] + 1)]
-        nms <- dimnames(x)
+        if (is.object(x)) x <- c(x)

/Henrik

On Tue, May 12, 2015 at 8:24 PM, Henrik Bengtsson
<henrik.bengtsson at ucsf.edu> wrote:
> Along Luke's lines, would(n't) it be enough to look for existence of
> attribute 'class' to decide whether to dispatch or not, i.e. if c() is
> needed or not?  Even without .subset(), there is a remarkable
> improvement.  I think it's worth condition the code on dispatch or
> not.  For example:
>
> [HB-X201]{hb}: svn diff diag.R
> Index: diag.R
> ===================================================================
> --- diag.R      (revision 68345)
> +++ diag.R      (working copy)
> @@ -23,9 +23,11 @@
>              stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
>
>          if((m <- min(dim(x))) == 0L) return(vector(typeof(x), 0L))
> +        nms <- dimnames(x)
> +        nrow <- dim(x)[1L]
>          ## NB: need double index to avoid overflows.
> -        y <- c(x)[1 + 0L:(m - 1L) * (dim(x)[1L] + 1)]
> -        nms <- dimnames(x)
> +        if (!is.null(attr(x, "class"))) x <- c(x)
> +        y <- x[1 + 0L:(m - 1L) * (nrow + 1)]
>          if (is.list(nms) && !any(sapply(nms, is.null)) &&
>              identical((nm <- nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)]))
>              names(y) <- nm
>
> ?
>
> /Henrik
>
> On Tue, May 12, 2015 at 5:33 AM, Martin Maechler
> <maechler at lynne.stat.math.ethz.ch> wrote:
>>>>>>> Steve Bronder <sbronder at stevebronder.com>
>>>>>>>     on Thu, 7 May 2015 11:49:49 -0400 writes:
>>
>>     > Is it possible to replace c() with .subset()?
>>
>> It would be possible, but I think "entirely" wrong.
>>
>> .subset() is documented to be an internal function not to be
>> used "lightly" and more to the point it is documented to *NOT*
>> dispatch at all.
>>
>> If you read and understood what Peter and Luke wrote, you'd not
>> special case here:
>>
>> diag() should not work only for pure matrices, but for all
>> "matrix-like" objects for which ``the usual methods'' work, such
>> as
>>    as.vector(.), c(.)
>>
>> That's why there has been the c(.) in there.
>>
>> You can always make code faster if you write the code so it only
>> has to work in one special case .. and work there very fast.
>>
>>
>>     > Example below
>>     > ####
>>     > ####
>>
>>     > library(microbenchmark)
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list