[Rd] About numerical accuracy: should there be a rational number class for fractions?

Thu Aug 6 21:17:00 CEST 2009

Hello, useRs.

I was monitoring the recent thread about the numerical (in)accuracy of
the C pow function and the apparent mismatch between R and matlab.

That particular problem has been addressed, but it seems there are
many others awaiting. I'm wondering if there is advice for package
writers & prospective R contributors about avoiding the most common
sorts of numerical mistakes.   Has this been discussed here?  I've
googled, finding naught, but perhaps I do not know the magic words.

I have seen two approaches to the numerical accuracy problem
associated with fractions.  I'm not a computer scientist, so perhaps I
over-simplify in interpreting the result.  Both of these approaches
seem to recommend we put off doing division until the last possible
moment.

One is  associated with a computer simulation article

Luis R. Izquierdo and J. Gary Polhill (2006)
Is Your Model Susceptible to Floating-Point Errors?
Journal of Artificial Societies and Social Simulation vol. 9, no. 4
<http://jasss.soc.surrey.ac.uk/9/4/4.html>

 J. Gary Polhill, Luis R. Izquierdo and Nicholas M. Gotts (2005)
The Ghost in the Model (and Other Effects of Floating Point Arithmetic)
Journal of Artificial Societies and Social Simulation vol. 8, no. 1
<http://jasss.soc.surrey.ac.uk/8/1/5.html>

And the other approach can be found in the Gambit program, which is a
tool for use in computing Nash (and other types of) equilibria in game
theory. (http://gambit.sourceforge.net/). Gambit was an NSF funded
project led by Dr. Richard McKelvey, who was one of the leading
mathematical political scientists in the US (before his untimely
death). It has been continued by his students, primarily Ted Turocy.

In Gambit, great care is taken to avoid any floating point
calculations involving fractions until the latest possible step. There
is a rational number class and fractions are added and subtracted in a
more exact way.  I think it "lowest common denominator" or similar
concept. (I remember that detail mainly because the switch from 32 bit
to 64 bit integers caused memory allocation troubles).

(Come to think of it, I believe the famous problem of poor variance
calculation in Microsoft Excel falls into the same category:

A. Talha Yalta, "The accuracy of statistical distributions in
Microsoft® Excel 2007"
Computational Statistics & Data Analysis
Volume 52, Issue 10, 15 June 2008, Pages 4579-4586

Use of Excel for Statistical Analysis
Neil Cox, Statistician, AgResearch Ruakura
http://www.agresearch.co.nz/Science/Statistics/exceluse1.htm

avoid division until the last possible moment!)

In R, we could protect ourselves somewhat if there were a class for
handling rational numbers.  Instead of taking a value like 2/3 and
letting the floating point calculation occur, a function could create
a "rational number object"

myfrac <- rational(2, 3)

And if I'm understanding the inheritance logic of objects in S 4
classes, then the rational class could implement +, -, / .

Well, I'm just wondering what you think about this question. I don't
think I could write this class, but I might be able to help somebody
refine and test such a thing if it existed.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas