[R] rep() fails at times=0.29*100

(Ted Harding) Ted.Harding at wlandres.net
Tue Apr 9 18:56:36 CEST 2013


[See at end]
On 09-Apr-2013 16:11:18 Jorge Fernando Saraiva de Menezes wrote:
> Dear list,
> 
> I have found an unusual behavior and would like to check if it is a
> possible bug, and if updating R would fix it. I am not sure if should post
> it in this mail list but I don't where is R bug tracker. The only mention I
> found that might relate to this is "If times is a computed quantity it is
> prudent to add a small fuzz." in rep() help, but not sure if it is related
> to this particular problem
> 
> Here it goes:
> 
>> rep(TRUE,29)
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE TRUE
>> rep(TRUE,0.29*100)
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> [28] TRUE
>> length(rep(TRUE,29))
> [1] 29
>> length(rep(TRUE,0.29*100))
> [1] 28
> 
> Just to make sure:
>> 0.29*100
> [1] 29
> 
> This behavior seems to be independent of what is being repeated (rep()'s
> first argument)
>> length(rep(1,0.29*100))
> [1] 28
> 
> Also it occurs only with the 0.29.
>> length(rep(1,0.291*100))
> [1] 29
>> for(a in seq(0,1,0.01)) {print(sum(rep(TRUE,a*100)))} #also shows correct
> values in values from 0 to 1 except for 0.29.
> 
> I have confirmed that this behavior happens in more than one machine
> (though I only have session info of this one)
> 
> 
>> sessionInfo()
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> [1]  LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252
>  LC_MONETARY=Portuguese_Brazil.1252
> [4] LC_NUMERIC=C                       LC_TIME=Portuguese_Brazil.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] spatstat_1.31-1 deldir_0.0-21   mgcv_1.7-22
> 
> loaded via a namespace (and not attached):
> [1] grid_2.15.3     lattice_0.20-13 Matrix_1.0-11   nlme_3.1-108
>  tools_2.15.3

The basic issue is, believe or not, that despite apparently:
  0.29*100
  # [1] 29

in "reality":
  0.29*100 == 29
  # [1] FALSE

In other words, as computed by R, 0.29*100 is not exactly equal to 29:

  29 - 0.29*100
  # [1] 3.552714e-15

The difference is tiny, but it is sufficient to make 0.29*100 slightly
smaller than 29, so rep(TRUE,0.29*100) uses the largest integer compatible
with "times = 0.29*100", i.e. 28. Hence the recommendation to "add a
little fuzz".

On the other hand, when you use rep(1,0.291*100) you will be OK:
This is because:

  29 - 0.291*100
  # [1] -0.1

so 0.291*100 is comfortably greater than 29 (but well clear of 30).

The reason for the small inaccuracy (compared with "mathematical
truth") is that R performs numerical calculations using binary
representations of numbers, and there is no exact binary representation
of 0.29, so the result of 0.29*100 will be slightly inaccurate.

If you do need to do this sort of thing (e.g. the value of "times"
will be the result of a calculation) then one useful precaution
could be to round the result:

  round(0.29*100)
  # [1] 29
  29-round(0.29*100)
  # [1] 0
  length(rep(TRUE,0.29*100))
  # [1] 28
  length(rep(TRUE,round(0.29*100)))
  # [1] 29

(The default for round() is 0 decimal places, i.e. it rounds to
an integer).

So, compared with:
  0.29*100 == 29
  # [1] FALSE

we have:
  round(0.29*100) == 29
  # [1] TRUE

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 09-Apr-2013  Time: 17:56:33
This message was sent by XFMail



More information about the R-help mailing list