[R] crossprod(x) vs crossprod(x,x)

Duncan Murdoch murdoch at stats.uwo.ca
Tue Nov 21 13:27:22 CET 2006


On 11/21/2006 6:39 AM, Robin Hankin wrote:
> Professor Ripley
> 
> thanks for this.
> 
> I see what you say about gc().  I'll try your looped test on my system:
> 
> 
> 
>  > x <- matrix(rnorm(3000000),ncol=3)
>  >
>  > system.time(for(i in 1:100){crossprod(x)})
> [1]  7.931 20.420 28.619  0.000  0.000
>  > system.time(for(i in 1:100){crossprod(x)})
> [1]  7.975 20.590 29.512  0.000  0.000
>  > system.time(for(i in 1:100){crossprod(x)})
> [1]  8.003 20.641 30.160  0.000  0.000
>  > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.350 0.032 2.552 0.000 0.000
>  > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.330 0.015 2.333 0.000 0.000
>  > system.time(for(i in 1:100){crossprod(x,x)})
> [1] 2.333 0.034 2.447 0.000 0.000
>  >
> 
> 
> 
> How come my findings are qualitatively different from yours?

Have you said what version of R you're using?  I tried your code in an 
Intel Mac with R 2.4.0, and a Windows machine with R-devel, and saw what 
Brian saw.  Maybe the Power Mac changes this, but that doesn't make a 
lot of sense.  On the other hand, it could be that an older R release is 
making unnecessary copies in one case but not the other.

Duncan Murdoch
> 
> 
> 
> 
> 
> 
> 
>  > Sys.info()
>                                                                          
>                    sysname
>                                                                          
>                   "Darwin"
>                                                                          
>                    release
>                                                                          
>                    "8.8.0"
>                                                                          
>                    version
> "Darwin Kernel Version 8.8.0: Fri Sep  8 17:18:57 PDT 2006;  
> root:xnu-792.12.6.obj~1/RELEASE_PPC"
>                                                                          
>                   nodename
>                                                                          
> "octopus.noc.soton.ac.uk"
>                                                                          
>                    machine
>                                                                          
>          "Power Macintosh"
>                                                                          
>                      login
>                                                                          
>                     "rksh"
>                                                                          
>                       user
>                                                                          
>                     "rksh"
>  >
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 21 Nov 2006, at 11:17, Prof Brian Ripley wrote:
> 
>> On Tue, 21 Nov 2006, Robin Hankin wrote:
>>
>>> I found out the other day that crossprod() will take a single  
>>> matrix argument;
>>> crossprod(x)  notionally returns crossprod(x,x).
>> It actually says
>>
>>     x, y: matrices: 'y = NULL' is taken to be the same matrix as 'x'.
>>
>> but not that it is the same as crossprod(x,x).
>>
>>> The two forms do not  return identical matrices:
>>>
>>> x <- matrix(rnorm(3000000),ncol=3)
>>> M1 <- crossprod(x)
>>> M2 <- crossprod(x,x)
>>>
>>> R> max(abs(M1-M2))
>>> [1] 1.932494e-08
>>>
>>> But what really surprised me is that crossprod(x) is slower than
>>> crossprod(x,x):
>>>
>>> R> system.time(crossprod(x))
>>> [1] 0.079 0.206 0.292 0.000 0.000
>>> R> system.time(crossprod(x,x))
>>> [1] 0.035 0.001 0.041 0.000 0.000
>> That's not usually the case: your example is too small to be  
>> reliable, and the large system time is suspicious.  I get (in R- 
>> devel, R's internal BLAS)
>>
>>> system.time(crossprod(x))
>>    user  system elapsed
>>   0.034   0.000   0.034
>>> system.time(crossprod(x,x))
>>    user  system elapsed
>>   0.044   0.001   0.044
>>
>> Such small times are subject to a lot of unrepeatability (they  
>> depend on when gc()s happen and hence on the current state of the  
>> gc tuning).  I suggest you try running repeats for a for loop of  
>> 100, e.g.
>>
>>> system.time(for(i in 1:100) crossprod(x))
>>    user  system elapsed
>>   3.602   0.004   4.895
>>> system.time(for(i in 1:100) crossprod(x))
>>    user  system elapsed
>>   3.612   0.015   3.984
>>> system.time(for(i in 1:100) crossprod(x))
>>    user  system elapsed
>>   3.514   0.009   4.727
>>> system.time(for(i in 1:100) crossprod(x,x))
>>    user  system elapsed
>>   5.636   0.013   8.963
>>> system.time(for(i in 1:100) crossprod(x,x))
>>    user  system elapsed
>>   5.255   0.011  10.961
>>
>> (on a heavily used dual-processor Opteron system).
>>
>> For a more realistic example
>>
>>> x <- matrix(rnorm(3000000),ncol=300)
>>> system.time(crossprod(x))
>>    user  system elapsed
>>   2.196   0.001   2.197
>>> system.time(crossprod(x,x))
>>    user  system elapsed
>>   4.609   0.020   8.972
>>
>> or using an optimized BLAS:
>>
>>> x <- matrix(rnorm(3000000),ncol=300)
>>> system.time(crossprod(x))
>>    user  system elapsed
>>   0.398   0.011   0.495
>>> system.time(crossprod(x,x))
>>    user  system elapsed
>>   0.454   0.004   0.479
>>
>> -- 
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
>   tel  023-8059-7743
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list