[R] Precision in R

Thu Jan 15 20:06:42 CET 2009

Hi there,
Thanks for the help. 
I see now where my results are coming from.
Marlon.

-----Original Message-----
From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu] 
Sent: Friday, 16 January 2009 6:26 a.m.
To: dos Reis, Marlon
Cc: David Winsemius; r-help at r-project.org
Subject: RE: [R] Precision in R

On Thu, 15 Jan 2009, dos Reis, Marlon wrote:

> I'm using:
> "solve(a, b, tol, LINPACK = FALSE, ...)"
> Therefore,tol ==.Machine$double.eps
>> .Machine$double.eps
> [1] 2.220446e-16
>
> It explains why 'solve' works for 'tcp1' but not for 'tcp2':
>
> eigen(tcp1)$values
> [1]  5.208856e+09  2.585816e+08 -3.660671e-06
>
> -3.660671e-06/5.208856e+09
> [1] -7.027783e-16
>
> eigen(tcp2)$values
> [1]  5.208856e+09  2.585816e+08 -6.416393e-08
> -6.416393e-08/5.208856e+09
> [1] -1.231824e-17
>
> My question would be why 'tcrossprod' and 'crossprod' leads to such
> difference?

Thanks for posting your data, Marlon.

This is what I get on windows XP:

> tcp1-tcp2
               [,1]          [,2]          [,3]
[1,] -2.861023e-06 -4.768372e-07 -4.768372e-07
[2,] -4.768372e-07 -3.814697e-06  2.622604e-06
[3,] -4.768372e-07  2.622604e-06 -5.960464e-08
>

but on my Gentoo Linux Intel Core 2 Duo:

  print(tcp1-tcp2,digits=20)
      [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
>

So, it would seem to be that the floating point calcs on my Windows
box, David's Mac, and whatever system you are using are not as
accurate as they might be.

It is well known that rounding error can play havoc with crossproducts 
(particularly when the range of magnitudes of the numbers is wide as it
is 
here), which is why matrix decompositions are generally used for solving

least squares problems.

The internal code for crossprod and tcrossprod is different, so the
computations are likely done in a different order, which would account
for the difference in tcp1 and tcp2 . Ultimately, array.c and blas.f
SUBROUTINE
DSYRK have the code if you want to dig into it.

So to summarize: the difference is not due to R per se, but in the
limited accuracy of floating point calcs on the system used.

HTH,

Chuck

>
> Marlon.
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Thursday, 15 January 2009 6:04 p.m.
> To: Charles C. Berry
> Cc: dos Reis, Marlon; r-help at r-project.org
> Subject: Re: [R] Precision in R
>
> I am seeing different behavior than don Reis on my installation as
well:
> mtx is the same as his WBtWB
>
> > mtx <- matrix(c(1916061939, 2281366606,  678696067, 2281366606,
> 3098975504, 1092911209, 678696067, 1092911209,  452399849), ncol=3)
> >
> > mtx
>            [,1]       [,2]       [,3]
> [1,] 1916061939 2281366606  678696067
> [2,] 2281366606 3098975504 1092911209
> [3,]  678696067 1092911209  452399849
> > eigen(mtx)
> $values
> [1]  5.208856e+09  2.585816e+08 -4.276959e-01
>
> $vectors
>            [,1]       [,2]       [,3]
> [1,] -0.5855545 -0.7092633  0.3925195
> [2,] -0.7678140  0.3299775 -0.5491599
> [3,] -0.2599763  0.6229449  0.7378021
>
> > rcond(mtx)
> [1] 5.33209e-11
>
> Despite a very ill-conditioned matrix, solve still proceeds
> >
> > solve(mtx)
>            [,1]       [,2]       [,3]
> [1,] -0.3602361  0.5039933 -0.6771204
> [2,]  0.5039933 -0.7051189  0.9473348
> [3,] -0.6771204  0.9473348 -1.2727543
> >
> > sessionInfo()
> R version 2.8.1 Patched (2009-01-07 r47515)
> i386-apple-darwin9.6.0
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> snipped package info
>
> -- 
> David Winsemius
> On Jan 14, 2009, at 11:15 PM, Charles C. Berry wrote:
>
>>
>> Marlon,
>>
>> Are you using a current version of R? sessionInfo()?
>>
>> It would help if you had something we could _fully_ reproduce.
>>
>> Taking the _printed_ values you have below (WBtWB) and adding or
>> subtracting what you have printed as the difference of the two
>> visually equal matrices ( say Delta ) , I am able to run
>>
>> 	solve( dat3 )
>> 	solve( WBtWB + Delta )
>> 	solve( WBtWB - Delta )
>> 	solve( WBtWB + 2*Delta )
>> 	solve( WBtWB - 2*Delta )
>>
>> and get the results to agree to 3 significant digits. And perturbing
>> things even more I still get solve() to return a value:
>>
>>> for ( i in 1:1000 ) solve(WBtWB - tcrossprod(rnorm(3)))
>>> for ( i in 1:1000 ) solve(WBtWB + tcrossprod(rnorm(3)))
>>
>>
>> And I cannot get condition numbers anything like what you report:
>>
>>> range(replicate( 10000, 1/kappa(dat3-tcrossprod(matrix(rnorm(9),
>>> 3)))))
>> [1] 5.917764e-11 3.350445e-09
>>>
>>
>>
>> So I am very curious that you got the results that you print below.
>>
>> I suppose that it is possible that the difference between what you
>> report and what I see lies in the numerical libraries (LINPACK/
>> LAPACK) that R calls upon.
>>
>> This was done on a windows XP PC. Here is my sessionInfo()
>>
>>> sessionInfo()
>> R version 2.8.1 Patched (2008-12-22 r47296)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.
>> 1252;LC_MONETARY=English_United States.
>> 1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>
>> HTH,
>>
>> Chuck
>>
>> On Thu, 15 Jan 2009, dos Reis, Marlon wrote:
>>
>>> Dear All,
>>> I'm preparing a simple algorithm for matrix multiplication for a
>>> specific purpose, but I'm getting some unexpected results.
>>> If anyone could give a clue, I would really appreciate.
>>> Basically what I want to do is a simple matrix multiplication:
>>> (WB) %*% t(WB).
>>> The WB is in the disk so I compared to approaches:
>>> -	Load 'WB' using 'read.table' (put it in WB.tmp) and then to the
>>> simple matrix multiplication
>>> WB.tmp%*%t(WB.tmp)
>>>
>>> -	Scan each row of WB and do the cross products 'sum(WB.i*WB.i)'
>>> and 'sum(WB.i*WB.j)', which proper arrangement leads to WBtWB.
>>>
>>> Comparing these two matrices, I get the very similar values, however
>>> when I tried their inverse, WBtWB leads to a singular system.
>>> I've tried different tests and my conclusion is that  my precision
>>> problem is related to cross products 'sum(WB.i*WB.i)' and
>>> 'sum(WB.i*WB.j)'.
>>> Does it makes sense?
>>> Thanks,
>>> Marlon
>>>
>>>
>>>> WB.tmp%*%t(WB.tmp)
>>>          WB.i       WB.i       WB.i
>>> WB.i 1916061939 2281366606  678696067
>>> WB.i 2281366606 3098975504 1092911209
>>> WB.i  678696067 1092911209  452399849
>>>
>>>> WBtWB
>>>          [,1]       [,2]       [,3]
>>> [1,] 1916061939 2281366606  678696067
>>> [2,] 2281366606 3098975504 1092911209
>>> [3,]  678696067 1092911209  452399849
>>>
>>>
>>>> WBtWB-WB.tmp%*%t(WB.tmp)
>>>            WB.i          WB.i          WB.i
>>> WB.i 2.861023e-06  4.768372e-07  4.768372e-07
>>> WB.i 4.768372e-07  3.814697e-06 -2.622604e-06
>>> WB.i 4.768372e-07 -2.622604e-06  5.960464e-08
>>>
>>>> solve(WB.tmp%*%t(WB.tmp))
>>>         WB.i      WB.i       WB.i
>>> WB.i -41692.80  58330.89  -78368.17
>>> WB.i  58330.89 -81608.66  109642.09
>>> WB.i -78368.17 109642.09 -147305.32
>>>
>>>> solve(WBtWB)
>>> Error in solve.default(WBtWB) :
>>> system is computationally singular: reciprocal condition number =
>>> 2.17737e-17
>>>
>>>
>>>
>>>
>>>    WB.tmp<-NULL
>>>    WBtWB<-matrix(NA,n,n)
>>>     for (i in 1:n)
>>>     {
>>>      setwd(Home.dir)
>>>      WB.i<-scan("WB.dat", skip = (i-1), nlines = 1)
>>>      WB.tmp<-rbind(WB.tmp,WB.i)
>>>      WBtWB[i,i]<-sum(WB.i*WB.i)
>>>      if (i<n)
>>>       {
>>>         for (j in (i+1):n)
>>>          {
>>>             setwd(Home.dir)
>>>             WB.j<-scan("WB.dat", skip = (j-1), nlines = 1)
>>>             WBtWB[i,j]<-sum(WB.i*WB.j)
>>>             WBtWB[j,i]<-sum(WB.i*WB.j)
>>>          }
>>>        }
>>>      }
>>>
>>>
>>> =
>>> =
>>>
=====================================================================
>>> Attention: The information contained in this message and...
>>> {{dropped:15}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                            Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
>> 92093-0901
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
=======================================================================
> Attention: The information contained in this message and/or
attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or
privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by
AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
>
=======================================================================
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive
Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
92093-0901

=======================================================================
Attention: The information contained in this message and...{{dropped:12}}