[R] Question about "sample" function and inconsistent results I am getting across machines.

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun May 3 21:32:54 CEST 2020


On 03/05/2020 1:39 a.m., Fomby, Tom wrote:
> Please consider the following code:
> 
> set.seed(1)
> 
> train.index = sample(181,150)
> head(train.index)
> # [1]  49  67 103 162  36 159  Result from my ASUS computer
> #
> # [1]  68 167 129 162 43 14  Result from my wife's HP Pavilion computer
> 
> In both cases, version 3.6.3 of R are being used.
> 
> In addition, of the 20 students in my Predictive Analytics class, 14 got the first result while 6 got the latter result.  These results do not seem to be specific to MAC (OS) versus PC (Windows).  In several cases, students using 3.6.3 got differing results.  This makes grading of homework challenging not knowing which partitions of the data are being used by the student.
> 
> Thank you for considering my question.

Likely some of you are storing and restoring workspaces, and have been 
doing so for a long time.  If you type

RNGkind()

what you should see is

[1] "Mersenne-Twister" "Inversion"        "Rejection"

but if the .Random.seed is restored from an old session, you might see

[1] "Mersenne-Twister" "Inversion"        "Rounding"

The latter uses the buggy version of sample().  Those users should run

RNGkind(sample.kind = "Rejection")

to start using the corrected sampling algorithm.  (The default was 
changed in R 3.6.0, but if you saved your seed from a previous version, 
you'd get the old sampler).

They should also stop reloading old workspaces, but that's another 
discussion.

Duncan Murdoch



More information about the R-help mailing list