[R] precision, incorrect(?) tapply() NA's

David James dj at research.bell-labs.com
Mon Nov 20 16:16:35 CET 2000


Hi,

Summary:

I ran into some unexpected behavior in approx() and tapply()
that introduced NA's in "clean" data due to (?) numerical accuracy/round off.
The culprit seems to be in match() that coerces it's arguments to character, 
loosing precision in the process.  

[R development version 1.2.0, 08 Nov 2000, on Linux]

Example:

> r
> [1] 0.6931472 0.6931472 0.6931472 0.6931472 0.6931472 1.3373177 1.8052529
> [8] 3.3902202 4.5529401

the 5th element is not exactly equal to the first 4 elements:

> diff(r)
> [1] 0.000000e+00 0.000000e+00 0.000000e+00 1.110223e-16 6.441705e-01
> [6] 4.679352e-01 1.584967e+00 1.162720e+00

which agrees with unique(r)

> unique(r)
> [1] 0.6931472 0.6931472 1.3373177 1.8052529 3.3902202 4.5529401

(note that .Machine$double.eps is [1] 2.220446e-16, twice the 4th diff.)

Now the following call to approx() erroneously introduces an NA:

> approx(r, rec.p, ties = min)
Error in approx(r, rec.p, ties = "min") : approx(): attempted to interpolate NA
values

where the value of rec.p is 
> rec.p
[1] 0.050 0.100 0.250 0.500 0.750 0.900 0.950 0.990 0.999

Looking at approx we see a call to tapply, which is producing the NA

> tapply(rec.p, r, min)
0.693147180559945 0.693147180559945  1.33731770200756  1.80525289211574 
            0.050                NA             0.900             0.950 
3.39022015484073  4.55294011989587 
           0.990             0.999 

Upon inspection, we see that tapply coerces its INDEX (second) argument
to factors, which in turn does a match():

> match(r, unique(r))
[1] 1 1 1 1 1 3 4 5 6

which shows that match() incorrectly assigned the 5th entry in r to the first
element of unique(r) -- not the 2nd one. (We also see that index 2 does not
appear on the output of match() at all.)

Thus match() seems to be introducing the round off as it coerces its 
arguments to character:

> match
function (x, table, nomatch = NA) 
.Internal(match(as.character(x), as.character(table), nomatch))

I'm enclosing the "r" and "rec.p" vectors as binary R data. I wish I could
send the ASCII versions, but then I would be introducing rounding
and the example above could not be reproduced.

By the way, I *do* realize that it's my (i.e., user) responsibility
to deal myself with such accuracy/precision issues (which I wishfully 
thought I was by specifying "ties = min" to approx).  Nevertheless, there 
seems to be some inconsistency somewhere.

With regards,

David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 381 bytes
Desc: tmp.Rdata
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20001120/8241c4c3/attachment.obj


More information about the R-help mailing list