[Rd] Randomness not due to seed

Dirk Eddelbuettel edd at debian.org
Wed Jul 20 18:20:49 CEST 2011


On 20 July 2011 at 18:02, peter dalgaard wrote:
| 
| On Jul 20, 2011, at 15:38 , Dirk Eddelbuettel wrote:
| 
| > 
| > On 20 July 2011 at 14:03, Jeroen Ooms wrote:
| > | >> I think Bill Dunlap's answer addressed it:  the claim appears to be false.
| > | 
| > | Here is another example where there is randomness that is not due to
| > | the seed. On the same machine, the same R binary, but through another
| > | interface. First directly in the shell:
| > | 
| > | > sessionInfo()
| > | R version 2.13.1 (2011-07-08)
| > | Platform: i686-pc-linux-gnu (32-bit)
| > | 
| > | locale:
| > |  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
| > |  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
| > |  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
| > |  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
| > |  [9] LC_ADDRESS=C               LC_TELEPHONE=C
| > | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
| > | 
| > | attached base packages:
| > | [1] stats     graphics  grDevices utils     datasets  methods   base
| > | 
| > | > set.seed(123)
| > | > print(coef(lm(dist~speed, data=cars)),digits=22)
| > |               (Intercept)                     speed
| > | -17.579094890510951643137   3.932408759124087715975
| > 
| > That's PBKAC --- even double precision does NOT get you 22 digits precision.
| 
| Hmm, yes, but you would expect the SAME function on the SAME data to yield the same floating point number, and give the SAME printout on the SAME R on the SAME hardware... 
| 
| FWIW all the Mac versions that I can access give the same results as the eclipse version.
| 
| Let's look at the numbers side-by-side
| 
| -17.579094890510951643137   3.932408759124087715975
| -17.57909489051087703615    3.93240875912408460735
|                 !                           !
|  12.345678901234567890123   1.234567890123456789012
| 
| so we're seeing differences around the 15th/16th significant digit. This is consistent with a difference of about one unit of least precision in the actual objects, but there could conceivably be other explanations, e.g. the print() function picking up random garbage. Jeroen: Could you save() the results from the two cases, load() them in a new session and compute the difference?

Yes 15 to 16 is common.  I should have added that to my post when I said '22
is too much'. And I did not want to give the impression that nine is what one
gets, nine is the minimum as per the libc docs I quoted but as you
illustrate, 15 to 16 can often be had.

Thanks for the follow-up.

Dirk

 
| > You may want to read up on 'what every computer scientist should know about
| > floating point arithmetic' by Goldberg (which is both a true internet classic)
| > and ponder why a common setting for the various 'epsilon' settings of general
| > convergence is set to of the constants supplied by the OS and/or its C
| > library. R has
| > 
| >  #define SINGLE_EPS     FLT_EPSILON
| >  [...]
| >  #define DOUBLE_EPS     DBL_EPSILON
| > 
| > in Constants.h. You can then chase the definition of FLT_EPSILON and
| > DBL_EPSILON through your system headers (which is a good exercise).
| > 
| > One place you may end up in the manual -- the following from the GNU libc
| > documentationon :Floating Point Parameters"
| > 
| > FLT_EPSILON
| >     This is the minimum positive floating point number of type float such that
| >     1.0 + FLT_EPSILON != 1.0 is true. It's supposed to be no greater than 1E-5. 
| > 
| > DBL_EPSILON
| > LDBL_EPSILON
| >     These are similar to FLT_EPSILON, but for the data types double and long
| >     double, respectively. The type of the macro's value is the same as the type
| >     it describes. The values are not supposed to be greater than 1E-9.
| > 
| > So there -- nine digits. 
| > 
| > Dirk 
| > 
| > 
| > | # And this is through eclipse (java)
| > | 
| > | > sessionInfo()
| > | R version 2.13.1 (2011-07-08)
| > | Platform: i686-pc-linux-gnu (32-bit)
| > | 
| > | locale:
| > |  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C
| > |  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8
| > |  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8
| > |  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8
| > |  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8
| > | [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
| > | 
| > | attached base packages:
| > | [1] stats     graphics  grDevices utils     datasets  methods   base
| > | 
| > | other attached packages:
| > | [1] rj_0.5.2-1
| > | 
| > | loaded via a namespace (and not attached):
| > | [1] rJava_0.9-1  tools_2.13.1
| > | 
| > | > set.seed(123)
| > | > print(coef(lm(dist~speed, data=cars)),digits=22)
| > |              (Intercept)                    speed
| > | 
| 
| > | 
| > | ______________________________________________
| > | R-devel at r-project.org mailing list
| > | https://stat.ethz.ch/mailman/listinfo/r-devel
| > 
| > -- 
| > Gauss once played himself in a zero-sum game and won $50.
| >                      -- #11 at http://www.gaussfacts.com
| > 
| > ______________________________________________
| > R-devel at r-project.org mailing list
| > https://stat.ethz.ch/mailman/listinfo/r-devel
| 
| -- 
| Peter Dalgaard
| Center for Statistics, Copenhagen Business School
| Solbjerg Plads 3, 2000 Frederiksberg, Denmark
| Phone: (+45)38153501
| Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
| 

-- 
Gauss once played himself in a zero-sum game and won $50.
                      -- #11 at http://www.gaussfacts.com



More information about the R-devel mailing list