[R] Question about converting from square roots to decimals

Sun Jan 21 02:25:37 CET 2007

On Sat, 2007-01-20 at 19:53 +0000, Ted.Harding at manchester.ac.uk wrote:
> On 20-Jan-07 Robert Barber wrote:
> > Hi,
> > 
> > I apologize if there is a simple answer to this question that I've
> > missed.  I did search the mailing list but I might not have used the
> > right keywords. Why does sum(A3^2) give the result of 1, but
> > sum(A3^2)==1 give the result of FALSE? 
> > 
> >> A3<-matrix(nrow=3,c(1/(2^.5),1/(2^.5),0))
> >> A3
> >           [,1]
> > [1,] 0.7071068
> > [2,] 0.7071068
> > [3,] 0.0000000
> >> sum(A3^2)
> > [1] 1
> >> sum(A3^2)^.5
> > [1] 1
> >> sum(A3^2)==1         # here's the part I don't understand
> > [1] FALSE
> >> sum(A3^2)^.5==1      # here's the part I don't understand
> > [1] FALSE
> > 
> > I realize that it has something to do with the conversion of the square
> > roots into decimals.  But shouldn't it then give me some number other
> > than 1 as the result for sum(A3^2)?  Are there other ways to do this
> > than what I've tried?  I'm trying to confirm that A3 is a unit vector.
> 
> This is an instance of what must be a candidate for the MFAQAT
> (most frequently asked qustion of all time).
> 
> The nub of the matter can be found in FAQ 7.31:
> 
>   http://www.r-project.org/  -->  FAQs
> 
> where, at 7.31, it says:
> 
>   7.31 Why doesn't R think these numbers are equal?
> 
>   The only numbers that can be represented exactly in R's
>   numeric type are integers and fractions whose denominator
>   is a power of 2. Other numbers have to be rounded to
>   (typically) 53 binary digits accuracy. As a result, two
>   floating point numbers will not reliably be equal unless
>   they have been computed by the same algorithm, and not
>   always even then. For example
> 
>      R> a <- sqrt(2)
>      R> a * a == 2
>      [1] FALSE
>      R> a * a - 2
>      [1] 4.440892e-16
> 
>   The function all.equal() compares two objects using a
>   numeric tolerance of .Machine$double.eps ^ 0.5. If you
>   want much greater accuracy than this you will need to
>   consider error propagation carefully.
> 
>   For more information, see e.g. David Goldberg (1991),
>   "What Every Computer Scientist Should Know About Floating-Point
>   Arithmetic", ACM Computing Surveys, 23/1, 5-48, also
>   available via
>   http://docs.sun.com/source/806-3568/ncg_goldberg.html.
> 
> However, what this FAQ does not point out is that "==" tests
> for exact equality, and even the 'help' page
> 
>   ?"=="
> 
> is not as explicit about this:
> 
>   For numerical values, remember '==' and '!=' do not allow
>   for the finite representation of fractions, nor for
>   rounding error. Using 'all.equal' with 'identical' is
>   almost always preferable.  See the examples.
> 
> Nor does the help
> 
>   ?all.equal
> 
> give the explanation:
> 
>   'all.equal(x,y)' is a utility to compare R objects 'x'
>   and 'y' testing "near equality". ...
> 
> and a user who is not aware of the imprecision problems inherent
> in computer arithmetic may well not see the point of testing for
> "near equality" when the user expects exact equality mathematically.
> 
> Statistically speaking, the frequency of occurrence of variants
> of this question is a phenomenon!
> 
> I hypothesise that it arises from a combination of two main
> circumstances:
> 
> a) Many users who are unfamiliar with the technical details of
>    finite-length binary arithmetic will not be expecting that
>    there could be a problem of this kind in the first place.
>    So, when it occurs, they will simply be puzzled.
> 
> b) It's actually quite difficult to find your way to the above
>    explanation in the FAQs. First, you need to anticipate that
>    this is the sort of thing that will be a FAQ. If you're subject
>    to (a) above, you may not be thinking on these lines.
> 
>    Secondly, even if you get as far as looking at the FAQs at
>    the above URL, you have a lot of scrolling down to do before
>    you find the question
> 
>      7.31 Why doesn't R think these numbers are equal?
> 
>    and even then, if you blink you'll miss it: trying it just
>    now, I in fact didn't spot it in the list of questions
>    immediately below the header
> 
>      7 R Miscellanea
> 
>    even though I already knew it was there somewhere and was
>    actively looking for it. It was only when I landed on the
>    full FAQ itself that I recognised it.
> 
>    It would have been quicker (I just tried that too) to start
>    at the top of the FAQs page, and use the browser's Search
>    tool to search for == : the sixth occurrence of "==" was it!
> 
>    But, even then, you still need to be thinking that the answer
>    is to be found in connection with "==". If you're subject to
>    (a), you'll be wondering instead why the answer was wrong.
> 
> This is such a frequently occurring issue that I feel there is
> a case for a prominently displayed link, maybe in FAQs but maybe
> better at the top level of r-project.org, to a brief but adequate
> discourse with a title like
> 
>   Arithmetic [im]precision and apparent errors in R
> 
> What do others think?
> 
> Ted.

Ted,

I think in Robert's case, and perhaps in others as well, there are two
confounding issues.

The first is the issue that we all have pointed to, which is the
representation of floating point numbers on binary computers.

The second, which tends to come up relatively frequently as well, is
understanding the difference between the precision of what is displayed
in R versus the precision of numeric objects as they are actually stored
internally.

For example, if Robert had seen:

options(digits = 20)
A3 <- matrix(nrow=3,c(1/(2^.5),1/(2^.5),0))

> sum(A3^2)
[1] 0.99999999999999977796

> sum(A3^2)^.5
[1] 0.99999999999999988898

I suspect that his questions might have been different.

I am not advocating changing the default for 'digits', just pointing out
that the two issues are somewhat inter-related in being a source of
confusion for useRs.

In addition, I would put some level of blame on the exposure to the use
of spreadsheets, which is a whole different matter and has come up on
the list frequently as well. 

Principally, if Excel and now unfortunately, OO.org's Calc, did not
include the modification of the IEEE754 standard, whereby numbers "close
to zero" are rounded to 0, users might have a differing appreciation for
how numbers are represented and displayed on binary computers. For the
sake of completeness, Gnumeric does not engage in such deviant behavior.

I won't belabor the point, but in Calc, the cell formula:

  =0.5 - 0.4 - 0.1 

displays (manually increasing to 20 places):

  0.00000000000000000000

Whereas in R:

> 0.5 - 0.4 - 0.1 
[1] -2.775558e-17

Thanks to Excel, users are insulated from such details, whereas in R,
users are expected to be more savvy in such things and to understand
what they are doing in more detail.

I think that arguments could be made as to the relative frequency of
such queries and whether certain questions should be given more or less
prominence. Two thoughts however to address your points:

1. Perhaps the Miscellanea section of the FAQ is better labelled as
"Unexpected Results in R and Miscellanea".

2. Since we tend to refer folks to the R Posting Guide and there is of
course a link to it at the bottom of all posts, perhaps some brief
comments on both floating point representations and precision issues
(display versus storage) could be added to a new section there.

That's my US\$ $\displaystyle e^{i\pi} + \sum_{n=1}^\infty \frac{1}{2^n} + 2(10^{-2})

:-)

Regards,

Marc