[R] How to build a large identity matrix faster?

Uwe Ligges ligges at statistik.tu-dortmund.de
Tue Jun 12 23:53:24 CEST 2012



On 11.06.2012 11:45, Ceci Tam wrote:
> diag(n) is alright when n = 5e3, it took 0.7 sec on my machine for
> diag(5e3). However, it's slow when n = 23000, diag(23000) took 15 sec


1. As others have said already, for huge data, the Matrix implementation 
for diagonals is the right idea, since it actually does know how to 
calculate things without actually using and saving the elements of the 
diagonal explicitly.

2. Such a huge matrix (23000x23000) does not work on all available 
machines and the OP tried to make the example code portable.

3. What timing do you expect?
I'd expect (assuming O(n) for the numbers of elements in the matrix) a 
ratio of round(23000^2/5000^2)=21 and actually you reported round(15/0.7)=21


Best,
Uwe Ligges


> On 11 June 2012 17:43, Ceci Tam<yeechingtam at gmail.com>  wrote:
>
>> diag(n) is alright when n = 5e3, it took 0.7 sec on my machine for
>> diag(5e3). However, it's slow when n = 23000, diag(23000) took 15 sec
>>
>> On 9 June 2012 06:32, R. Michael Weylandt<michael.weylandt at gmail.com>wrote:
>>
>>> On Fri, Jun 8, 2012 at 5:31 PM, R. Michael Weylandt
>>> <michael.weylandt at gmail.com>  wrote:
>>>> For the matter and hand, you can probably get to it by simply trying
>>>> base:::diag. In this case, it's not too hard because what you're
>>>> seeing is the S4 generic that the Matrix package is defining _over_
>>>> the regular base function generic.
>>>
>>> Sorry -- the regular base function is not a generic in this case.
>>> Everything else still holds though. (Same tricks to find things work
>>> with print which becomes both an S3 and S4 generic on loading Matrix)
>>>
>>>>
>>>> More generally, going down the rabbit hole of S4:
>>>>
>>>> As it suggests, first try
>>>>
>>>> showMethods("diag")
>>>>
>>>> and you'll see a long list of types. The parallel to the *.default
>>>> method is the one with signature "ANY" so you can try that:
>>>>
>>>> getMethod("diag", "ANY")
>>>>
>>>> which gets you where you need to be.
>>>>
>>>> Hope this helps,
>>>> Michael
>>>>
>>>> On Fri, Jun 8, 2012 at 5:11 PM, Spencer Graves
>>>> <spencer.graves at structuremonitoring.com>  wrote:
>>>>>       How can one get the source code for diag?  I tried the following:
>>>>>
>>>>>
>>>>>> diag
>>>>> standardGeneric for "diag" defined from package "base"
>>>>>
>>>>> function (x = 1, nrow, ncol)
>>>>> standardGeneric("diag")
>>>>> <environment: 0x0000000009dc1ab0>
>>>>> Methods may be defined for arguments: x, nrow, ncol
>>>>> Use  showMethods("diag")  for currently available ones.
>>>>>
>>>>>
>>>>>       How can I look at the code past the methods dispatch?
>>>>>
>>>>>
>>>>>> methods('diag')
>>>>> [1] diag.panel.splom
>>>>> Warning message:
>>>>> In methods("diag") : function 'diag' appears not to be generic
>>>>>
>>>>>
>>>>>       So "diag" is an S4 generic.  I tried the following:
>>>>>
>>>>>
>>>>>> dumpMethods('diag', file='diag.R')
>>>>>> readLines('diag.R')
>>>>> character(0)
>>>>>
>>>>>
>>>>>       More generally, what do you recommend I read to learn about S4
>>>>> generics?  I've read fair portions of Chambers (1998, 2008), which
>>> produced
>>>>> more frustration than enlightenment for me.
>>>>>
>>>>>
>>>>>       Thanks,
>>>>>       Spencer
>>>>>
>>>>>
>>>>> On 6/8/2012 12:07 PM, Uwe Ligges wrote:
>>>>>>
>>>>>> I quickly looked at it, and the difference comes from:
>>>>>>
>>>>>> n<- 5e3
>>>>>> system.time(x<- array(0, c(n, n))) # from diag()
>>>>>> system.time(x<- matrix(0, n, n))   # from Rdiag()
>>>>>>
>>>>>> Replaced in R-devel.
>>>>>>
>>>>>> Best,
>>>>>> Uwe Ligges
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07.06.2012 12:11, Spencer Graves wrote:
>>>>>>>
>>>>>>> On 6/7/2012 2:27 AM, Rui Barradas wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> To my great surprise, on my system, Windows 7, R 15.0, 32 bits, an R
>>>>>>>> version is faster!
>>>>>>>
>>>>>>>
>>>>>>> I was also surprised, Windows 7, R 2.15.0, 64-bit
>>>>>>>
>>>>>>>
>>>>>>>> rbind(diag=t1, Rdiag=t2, ratio=t1/t2)
>>>>>>> user.self sys.self elapsed user.child sys.child
>>>>>>> diag 0.72 0.080000 0.81 NA NA
>>>>>>> Rdiag 0.09 0.030000 0.12 NA NA
>>>>>>> ratio 8.00 2.666667 6.75 NA NA
>>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>> R version 2.15.0 (2012-03-30)
>>>>>>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>> [1] LC_COLLATE=English_United States.1252
>>>>>>> [2] LC_CTYPE=English_United States.1252
>>>>>>> [3] LC_MONETARY=English_United States.1252
>>>>>>> [4] LC_NUMERIC=C
>>>>>>> [5] LC_TIME=English_United States.1252
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] splines stats graphics grDevices utils datasets methods
>>>>>>> [8] base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>> [1] fda_2.2.9 Matrix_1.0-6 lattice_0.20-6 zoo_1.7-7
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>> [1] grid_2.15.0 tools_2.15.0
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Spencer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Rdiag<- function(n){
>>>>>>>> m<- matrix(0, nrow=n, ncol=n)
>>>>>>>> m[matrix(rep(seq_len(n), 2), ncol=2)]<- 1
>>>>>>>> m
>>>>>>>> }
>>>>>>>>
>>>>>>>> Rdiag(4)
>>>>>>>>
>>>>>>>> n<- 5e3
>>>>>>>> t1<- system.time(d1<- diag(n))
>>>>>>>> t2<- system.time(d2<- Rdiag(n))
>>>>>>>> all.equal(d1, d2)
>>>>>>>> rbind(diag=t1, Rdiag=t2, ratio=t1/t2)
>>>>>>>>
>>>>>>>>
>>>>>>>> Anyway, why don't you create it once, save a copy and use it many
>>> times?
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Rui Barradas
>>>>>>>>
>>>>>>>> Em 07-06-2012 08:55, Ceci Tam escreveu:
>>>>>>>>>
>>>>>>>>> Hello, I am trying to build a large size identity matrix using
>>>>>>>>> diag(). The
>>>>>>>>> size is around 23000 and I've tried diag(23000), that took a long
>>> time.
>>>>>>>>> Since I have to use this operation several times in my program, the
>>>>>>>>> running
>>>>>>>>> time is too long to be tolerable. Are there any alternative for
>>>>>>>>> diag(N)?
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> yct
>>>>>>>>>
>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list