[R] scan seems to modify the data

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Mar 31 20:55:59 CEST 2004


Note, digits in print() corresponds to signif and not to round.

You need to input some knowledge about your problem to call such issues.

On Wed, 31 Mar 2004, Stephane DRAY wrote:

> At 13:34 31/03/2004, Prof Brian Ripley wrote:
> 
> >Take a look at formatReal.  scientific thinks 0.251 has 17 digits and
> >0.255 has 3.  It really doesn't make any sense to ask for more precision
> >than you have (.Machine$double.eps) and you do often get spurious
> >errors if you attempt to do so.  So 15 digits is normally safe, but no
> >more.
> >
> >Note that there are decimal -> binary -> decimal conversions and you
> >can't say which one introduced the small changes.
> 
> I completely agree with you. My problem arise when I try to compute a 
> correlation. One of the variable seems to have equal values but it does 
> not. Hence, it has a very low variance and so when I try to compute the 
> correlation with another variable, this correlation is very high. I wonder 
> if it would not be good to introduce a tolerance threshold. Is it 
> meaningful to produce correlation when a variance is very low ?
> See the example below :
> 
>  > essai=matrix(c(0.266,.234,.005,.481,.1,.009,.4,.155,.255,.2,.34,.43),4,3)
>  > essai2=sweep(essai,2,apply(essai,2,sum),"/")
>  > x=coef(lm(essai2~scale(runif(4))))
>  > x
>                        [,1]      [,2]       [,3]
> (Intercept)     0.25000000 0.2500000 0.25000000
> scale(runif(4)) 0.05307906 0.1330111 0.06936634
>  > cor(x[1,],runif(3))
> [1] 0.932772
>  > var(x)
>             [,1]        [,2]       [,3]
> [1,] 0.01938893 0.011518783 0.01778528
> [2,] 0.01151878 0.006843202 0.01056607
> [3,] 0.01778528 0.010566067 0.01631426
>  > var(x[1,])
> [1] 1.92593e-33
> 
> Obviously, I can introduce this threshold, but I wonder if 15 digits is 
> always a good limit to avoid this kind of problems
> 
>  > cor(round(x[1,],15),runif(3))
> [1] NA
> Warning message:
> The standard deviation is zero in: cor(x, y, na.method, method == "kendall")
> 
> 
> Thanks a lot to all,
> 
> Stéphane DRAY
> -------------------------------------------------------------------------------------------------- 
> 
> Département des Sciences Biologiques
> Université de Montréal, C.P. 6128, succursale centre-ville
> Montréal, Québec H3C 3J7, Canada
> 
> Tel : 514 343 6111 poste 1233
> E-mail : stephane.dray at umontreal.ca
> -------------------------------------------------------------------------------------------------- 
> 
> Web                                          http://www.steph280.freesurf.fr/
> -------------------------------------------------------------------------------------------------- 
> 
> 
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list