[Rd] Computing means, variances and sums

Sun Feb 19 20:02:41 CET 2006

Dear Professors Ripley & Murdoch, & others:

	  If this were just an issue of computations like var(rep(0.02, 10)) 
producing something other than 0 on certain platforms (e.g., 
combinations of operating systems and microprocessors), then I would 
suggest it be documented as an FAQ and left as a tool to help explain 
finite precision arithmetic and rounding issues to people who don't yet 
understand those concepts.

	  However, Duncan's comment shows that it is more than that, namely 
that certain DLLs change the precision of the fpu (floating point unit?) 
from 64 to 53 bit matissas AND DON'T RESET THEM.  When this is not 
detected, it could make the difference between a useable answer and 
nonsense in poorly conditioned applications where those 9 bits might be 
important.  For me, that problem is NOT that one occasionally gets 
nonsense from a poorly conditioned compution.  Rather it is that the 
SAME computation could give a useful answer in one case and nonsense a 
few seconds later ON THE SAME COMPUTER, operating system, etc.

	  To test my understanding, I simplified Barry Zajdik's example further:

 > var(rep(.2, 3))
[1] 0
 > RSiteSearch("convert to binary")
A search query has been submitted to http://search.r-project.org
The results page should open in your browser shortly
 > var(rep(.2, 3))
[1] 1.155558e-33
 > sessionInfo()
R version 2.2.1, 2005-12-20, i386-pc-mingw32

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
[7] "base"

	  This indicates there is a problem that perhaps should eventually be 
fixed.  However, please do NOT spend time on this because I suggested it 
was an issue.  The conditions under which this would create problems for 
anyone are still so rare that I would not want to alter anyone's work 
priorities for it.

	  spencer graves
p.s.  If my computations are correct, 0.2 = 0*/2 + 0/4 + 1/8 + 1/16 + 
0/32 + 0/64 + 1/128 + 1/256 + 0/512 + 0/1024 + 1/2048 + 1/4096 + ... = 
0.3333333333333h.  Perhaps someone can extend this to an FAQ to help 
explain finite precision arithmetic and rounding issues.

Prof Brian Ripley wrote:
> There has been a recent thread on R-help on this, which resurrected 
> concepts from bug reports PR#1228 and PR#6743.  Since the discussion has 
> included a lot of erroneous 'information' based on misunderstandings of 
> floating-point computations, this is an attempt to set the record 
> straight and explain the solutions adopted.
> 
> The problem was that var(rep(0.02, 10)) was observed to be (on some 
> machines) about 1e-35.  This can easily be explained.
> 
> 0.02 is not an exactly repesented binary fraction.  Repeatedly adding 
> the represented quantity makes increasing rounding errors relative to 
> the exact computation, so (on Sparc Solaris)
> 
>> var(rep(0.02, 10))
> 
> [1] 1.337451e-35
> 
>> options(digits=18)
>> sum(rep(0.02, 10))
> 
> [1] 0.19999999999999998
> 
>> sum(rep(0.02, 10)) -0.2
> 
> [1] -2.775557561562891e-17
> 
>> sum(rep(0.02, 10))/10 -0.02
> 
> [1] -3.469446951953614e-18
> 
>> z <- sum(rep(0.02, 10))/10 -0.02
>> 10*z^2/9
> 
> [1] 1.3374513502689138e-35
> 
> and so the non-zero variance is arising from (x[i] - mean) being non-zero.
> (I did check that was what was happening at C level.)
> 
> There has been talk of other ways to arrange these computations, for 
> example Kahan summation and Welford's algorithm (see Chan & Lewis, 1979, 
> CACM 22, 526-531 and references therein).  However, R already used the 
> two-pass algorithm which is the most accurate (in terms of error bounds) 
> in that reference.
> 
> Why are most people seeing 0?  Because the way computation is done in 
> modern FPUs is not the computation analysed in early numerical analysis 
> papers, including in Chan & Lewis.  First, all FPUs that I am aware of 
> allow the use of guard digits, effectively doing intermediate 
> computations to one more bit than required.  And many use extended 
> precision registers for computations which they can keep in FP 
> registers, thereby keeping another 10 or more bits.  (This includes R on 
> most OSes on ix86 CPUs, the exception being on Windows where the FPU has 
> been reset by some other software. Typically it is not the case for RISC 
> CPUs, e.g. Sparc.)
> 
> The use of extended precision registers invalidates the textbook 
> comparisons of accuracy in at least two ways.  First, the error analysis 
> is different if all intermediate results are stored in extended 
> precision. Second, the simpler the algorithm, the more intermediate 
> results which will remain in extended precision.  This means that (for 
> example) Kahan summation is usually less accurate than direct summation 
> on real-world FPUs.
> 
> There are at least two steps which can be done to improve accuracy.
> One is to ensure that intermediate results are stored in extended 
> precision.  Every R platform of which I am aware has a 'long double' 
> type which can be used.  On ix86 this is the extended precision type 
> used internally in the FPU and so the cost is zero or close to zero, 
> whereas on a Sparc the extra precision is more but there is some cost.  
> The second step is to use iterative refinement, so the final part of 
> mean.default currently is
> 
>     ## sum(int) can overflow, so convert here.
>     if(is.integer(x)) x <- as.numeric(x)
>     ## use one round of iterative refinement
>     res <- sum(x)/n
>     if(is.finite(res)) res + sum(x-res)/n else res
> 
> This is a well-known technique in numerical linear algebra, and improves 
> the accuracy whilst doubling the cost.  (This is about to be replaced by 
> an internal function to allow the intermediate result to be stored in a 
> long double.)
> 
> Note the is.finite(res) there.  R works with the extended IEC60059 (aka 
> IEEE754) quantities of Inf, -Inf and NaN (NA being a special NaN).  The 
> rearranged computations do not work correctly for those quantities.  So 
> although they can be more 'efficient' (in terms of flops), they have to 
> be supplemented by some other calculation to ensure that the specials 
> are handled correctly.  Remember once again that we get both speed and 
> accuracy advantages by keeping computations in FPU registers, so 
> complicating the code has considerable cost.
> 
> R-devel will shortly use long doubles for critical intermediate results 
> and iterative refinement for calculations of means.  This may be slower 
> but it would be an extreme case in which the speed difference was 
> detectable.  Higher accuracy has a cost too: there are several packages 
> (dprep and mclust are two) whose results are critically dependent on 
> fine details of computations and will for example infinite-loop if an 
> optimized BLAS is used on AMD64.
> 
> 
> The choice of algorithms in R is an eclectic mixture of accuracy and 
> speed.  When (some of) R-core decided to make use of a BLAS for e.g. 
> matrix products this produced a large speed increase for those with 
> optimized BLAS (and a small speed decrease for others), but it did 
> result in lower accuracy and problems with NAs etc (and the alternative 
> algorithms have since been added back to cover such cases).  But it 
> seems that nowadays few R users understand the notion of rounding error, 
> and it is easier to make the computations maximally accurate than to 
> keep explaining basic numerical analysis on R-help.  (The problem is 
> thereby just pushed farther down the line, because for example linear 
> regression is not maximally accurate.)
>