[R] Difference between 32-bit and 64-bit version

Thu Jun 4 11:53:25 CEST 2015

On 04/06/2015 3:59 AM, Thierry Onkelinx wrote:
> Dear Duncan,
> 
> I had been thinking about FAQ 7.31. I tried to create a dummy dataset
> with the same structure to replicate the problem with the need of
> sending my dataset. However all of them gave identical() results between
> 32-bit and 64-bit. Note that coef()$fRow is a 1266 x 6 data.frame. Is it
> correct to infer that tiny difference between 32-bit and 64-bit are
> possible but have a low probability of occurring?

Differences are rare, but it's hard to assign a probability to them.

Duncan Murdoch

> 
> signif() makes indeed more sense than round(). Using 20 digits gives
> identical results, 21 digits gives non identical results.
> 
> Best regards,
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> 
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data. ~ John Tukey
> 
> 2015-06-03 18:09 GMT+02:00 Duncan Murdoch <murdoch.duncan at gmail.com
> <mailto:murdoch.duncan at gmail.com>>:
> 
>     On 03/06/2015 11:56 AM, Thierry Onkelinx wrote:
>     > Dear all,
>     >
>     > I'm a bit puzzled by the difference in an object when created in R
>     32-bit
>     > and R 64-bit.
>     >
>     > Consider the code below. test.rda is available at
>     >
>     https://drive.google.com/file/d/0BzBrlGSuB9n-NFBWeC1TR093Sms/view?usp=sharing
>     >
>     > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8
>     > library(lme4)
>     > load("test.rda")
>     > coef.32 <- coef(test)
>     > save(coef.32, file = "32bit.rda")
>     >
>     > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8
>     > library(lme4)
>     > load("~/test.rda")
>     > coef.64 <- coef(test)
>     > save(coef.64, file = "64bit.rda")
>     >
>     >
>     > # Compare the results
>     > # Run in R 3.2.0 Windows 32-bit, lme4 1.1-8
>     > # Run in R 3.2.0 Windows 64-bit, lme4 1.1-8
>     > library(lme4)
>     > load("32bit.rda")
>     > load("64bit.rda")
>     > identical(coef.32, coef.64) # FALSE
>     > identical(coef.32$fRow, coef.64$fRow) # FALSE
>     > identical(coef.32$fLocation, coef.64$fLocation) # TRUE
>     > identical(coef.32$fSubLocation, coef.64$fSubLocation) # TRUE
>     >
>     > The first comparison is FALSE, because the second is FALSE. But
>     why is the
>     > second FALSE and the third and fourth TRUE?
>     >
>     > My goal is the calculate a SHA1 hash on the coef(test) to track if the
>     > coefficients of test have changed. I'd like to get the same hash on a
>     > 32-bit and 64-bit system. A simple hack would be to calculate the
>     hash on
>     > round(coef(test), 20). Is that a good or bad idea?
>     >
>     > identical(round(coef.32$fRow, 20), round(coef.64$fRow, 20)) # TRUE
> 
>     Different math libraries round differently, so small differences are
>     expected.  This is FAQ 7.31.  In many cases the 32 bit calculations are
>     more accurate, because they tend to use more 80 bit extended precision
>     intermediate values, but that is not guaranteed.
> 
>     Rounding before comparing makes sense, but I would use signif() instead
>     of round(), I would choose a relatively small number of significant
>     digits, and I would expect to see a few false positives:  if the true
>     value is 0 but some "random" noise is added, I'd expect values rounded
>     by signif() to be unequal.
> 
>     Duncan Murdoch
> 
>     >
>     > Best regards,
>     >
>     > ir. Thierry Onkelinx
>     > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
>     > Forest
>     > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>     > Kliniekstraat 25
>     > 1070 Anderlecht
>     > Belgium
>     >
>     > To call in the statistician after the experiment is done may be no more
>     > than asking him to perform a post-mortem examination: he may be able to say
>     > what the experiment died of. ~ Sir Ronald Aylmer Fisher
>     > The plural of anecdote is not data. ~ Roger Brinner
>     > The combination of some data and an aching desire for an answer does not
>     > ensure that a reasonable answer can be extracted from a given body of data.
>     > ~ John Tukey
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>     >
> 
>