[Rd] Non identical numerical results from R code vs C/C++ code?

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Fri Sep 10 13:24:28 CEST 2010


On Fri, Sep 10, 2010 at 11:46 AM, Renaud Gaujoux
<renaud at mancala.cbio.uct.ac.za> wrote:
> Hi,
>
> suppose you have two versions of the same algorithm: one in pure R, the
> other one in C/C++ called via .Call().
> Assuming there is no bug in the implementations (i.e. they both do the same
> thing), is there any well known reason why the C/C++ implementation could
> return numerical results non identical to the one obtained from the pure R
> code? (e.g. could it be rounding errors? please explain.)
> Has anybody had a similar experience?
>
> By not identical, I mean very small differences (< 2.4 e-14), but enough to
> have identical() returning FALSE. Maybe I should not bother, but I want to
> be sure where the differences come from, at least by mere curiosity.
>
> Briefly the R code perform multiple matrix product; the C code is an
> optimization of those specific products via custom for loops, where entries
> are not computed in the same order, etc... which improves both memory usage
> and speed. The result is theoretically the same.

 I've had almost exactly this situation recently with an algorithm I
first implemented in R and then in C. Guess what the problem was? Yes,
a bug in the C code.

 At first I thought everything was okay because the returned values
were close-ish, and I thought 'oh, rounding error', but I wasn't happy
about that. So I dug a bit deeper. This is worth doing even if you are
sure there no bugs in the C code (remember: there's always one more
bug).

 First, construct a minimal dataset so you can demonstrate the problem
with a manageable size of matrix. I went down to 7 data points. Then
get the C to print out its inputs. Identical, and as expected? Good.

 Now debug intermediate calculations, printing or saving from C and
checking they are the same as the intermediate calculations from R. If
possible, try returning intermediate calculations in C and checking
identical() with R intermediates.

 Eventually you will find out where the first diverge - and this is
the important bit. It's no use just knowing the final answers come out
different, it's likely your answer has a sensitive dependence on
initial conditions. You need to track down when the first bits are
different.

 Or it could be rounding error...

Barry



More information about the R-devel mailing list