[Rd] Why is the diag function so slow (for extraction)?

Steve Bronder sbronder at stevebronder.com
Mon May 11 19:10:02 CEST 2015


Sorry if this is a re-post, not sure if my original message got though. Is
it possible to replace c() with .subset()? Or would this throw errors
because of something specific in c() or as.matrix()? There's a pretty nice
speed up by replacing c() with the .subset().

Ex.
----
library(microbenchmark)


diag2 <- function(x,nrow, ncol){
  if (is.matrix(x)) {
    if (nargs() > 1L)
      stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
    if ((m <- min(dim(x))) == 0L)
      return(vector(typeof(x), 0L))
    # replace this part
    y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
    nms <- dimnames(x)
    if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <-
nms[[1L]][seq_len(m)]),

nms[[2L]][seq_len(m)]))
      names(y) <- nm
    return(y)
  }
  if (is.array(x) && length(dim(x)) != 1L)
    stop("'x' is an array, but not one-dimensional.")
  if (missing(x))
    n <- nrow
  else if (length(x) == 1L && nargs() == 1L) {
    n <- as.integer(x)
    x <- 1
  }
  else n <- length(x)
  if (!missing(nrow))
    n <- nrow
  if (missing(ncol))
    ncol <- n
}

# Tests

nc  <- 1e04

set.seed(1)
m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)

runoff <- microbenchmark(
  diaga <- diag(m),
  diagb <- diag2(m)
)

runoff

Unit: microseconds
expr        min         lq        mean      median        uq        max
neval
diaga 429033.896 434186.694 512143.6728 503355.5865 572811.11 656035.584
100
diagb    216.112    251.445    536.8531    688.3595    706.98   2437.921
100


Regards,

Steve Bronder
Website: stevebronder.com
Phone: 412-719-1282
Email: sbronder at stevebronder.com


On Thu, May 7, 2015 at 11:49 AM, Steve Bronder <sbronder at stevebronder.com>
wrote:

> Is it possible to replace c() with .subset()? Example below
>
> ####
> ####
>
> library(microbenchmark)
>
>
> diag2 <- function(x,nrow, ncol){
>   if (is.matrix(x)) {
>     if (nargs() > 1L)
>       stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
>     if ((m <- min(dim(x))) == 0L)
>       return(vector(typeof(x), 0L))
>     # replace this part
>     y <- .subset(x,1 + 0L:(m - 1L) * (dim(x)[1L] + 1))
>     nms <- dimnames(x)
>     if (is.list(nms) && !any(sapply(nms, is.null)) && identical((nm <-
> nms[[1L]][seq_len(m)]),
>
> nms[[2L]][seq_len(m)]))
>       names(y) <- nm
>     return(y)
>   }
>   if (is.array(x) && length(dim(x)) != 1L)
>     stop("'x' is an array, but not one-dimensional.")
>   if (missing(x))
>     n <- nrow
>   else if (length(x) == 1L && nargs() == 1L) {
>     n <- as.integer(x)
>     x <- 1
>   }
>   else n <- length(x)
>   if (!missing(nrow))
>     n <- nrow
>   if (missing(ncol))
>     ncol <- n
> }
>
> nc  <- 10
>
> set.seed(1)
> m <- matrix(sample(letters,nc^2,replace=TRUE), ncol = nc)
>
>
> runoff <- microbenchmark(
>   diaga = diag(m),
>   diagb = diag2(m)
> )
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbronder at stevebronder.com
>
>
> On Tue, May 5, 2015 at 9:46 AM, <luke-tierney at uiowa.edu> wrote:
>
>> Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure
>> why the change was made many years ago, but this was before names were
>> handled explicitly. It would definitely be better to not force the
>> duplicate, at least in the case where we are sure c() and [ would not
>> dispatch.
>>
>> Best,
>>
>> luke
>>
>> On Mon, 4 May 2015, peter dalgaard wrote:
>>
>>
>>>  On 04 May 2015, at 19:59 , franknarf <by.hook.or at gmail.com> wrote:
>>>>
>>>> But I'm still wondering why diag() uses c()...? With it being so slow,
>>>> I'd
>>>> be inclined to write a qdiag() without the c() and just use that the
>>>> next
>>>> time I need matrix algebra. Any insight would be appreciated; thanks!
>>>>
>>>
>>> Well, there are two possibilities: Either it is deliberate or it isn't.
>>>
>>> The latter isn't too unlikely, given that the effect is seen for large
>>> matrices. I would appear to be a matter of O(n) (picking out n items) vs.
>>> O(n^2) (copying an n x n matrix), but this might drown out in a context
>>> involving matrix multiplication and/or inversion, both of which are O(n^3).
>>>
>>> If it is deliberate, the question is why. There could be devils in the
>>> details; notice in particular that c() strips off non-name attributes.
>>> However, I'm not aware of a situation where such attributes could cause
>>> trouble.
>>>
>>> -pd
>>>
>>>
>>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list