[R] A reproducibility puzzle with NORM

Fri Sep 21 16:12:30 CEST 2007

On 21-Sep-07 13:44:40, Prof Brian Ripley wrote:
> Norm uses a Box-Muller normal RNG, and rngseed does not reset
> its state (it has some Fortran save variables).  So if you ask
> for an odd number of normals and call rngseed, the next normal
> 'generated' is the second half of the last pair with the
> previous seed.
> 
> Ideally packages should be converted to use R's number generators
> which do not have such quirks.

Many thanks! That is indeed a deeply hidden quirk.
(It should be documented!)

Presumably it applies also to 'mix'? I'm even wondering if
it might apply to 'cat' (which I'm supposed to be a maintainer
of); in principle it should not, since there's no unavoidable
necessity to use normal RNs there, hence no need for a
Box-Muller RNG; but one never knows. I'd better look!

Trying to think of a work-round for immediate purposes,
but there seems to be nothing obvious which would avoid
the problem (e.g. testing for odd/even in number of
imputations), since one cannot be sure of how many normal
RNs have been called for during the DA and Imputation
steps.

Maybe one can add a bit of code to rngseed() to administer
a salutary kick to something.

Thanks!
Ted.

> 
> On Fri, 21 Sep 2007, Ted.Harding at manchester.ac.uk wrote:
> 
>> Hi Folks,
>> I'm using the 'norm' package (based on Shafer's NORM)
>> on some  data. In outline, (X,Y) are bivariate normal,
>> var(X)=0.29, var(Y)=24.4, cov(X,Y)=-0.277,
>> there are some 900 cases, and some 170 values of Y
>> have been set "missing" (NA).
>>
>> The puzzle is that, repeating the multiple imputation
>> starting from the same random seed, I get different
>> answers from the repeats depending if I do an odd number
>> of imputations, but the same answer on the repeats
>> if I do en even number (which includes the second
>> repeat of an odd number).
>>
>> It may possibly have something to do with how I've
>> written the code for the loop, but if so then I'm
>> not seeing it!
>>
>> CODE:
>>
>> ## Set up the situation:
>> Data<-read.csv("MyData.csv")
>> X<-Data$X; Y<-Data$Y
>> ##(If you want to try it, set your own data here)
>> Raw<-cbind(X,Y)
>> library(norm)
>>
>> ## Initialise stuff
>> s<-prelim.norm(Raw)
>> t0<-em.norm(s)
>>
>> ##########################
>> ## Set the Random Seed
>> rngseed(31425)
>>
>> ## Do the first imputation:
>> t     <-  da.norm(s,t0,steps=20)
>> Imp   <- imp.norm(s,t, Raw)
>> X.Imp <- Imp[,1]; Y.Imp<-Imp[,2]
>>
>> ## Now do the rest, and accumulate lists of the results
>> ## Est.Imp = list of estimated coeffs
>> ## SE.Imp  = list of SEs of estimated coeffs:
>> Est.Imp <- list(summary(lm(Y.Imp~X.Imp))$coef[,1])
>> SE.Imp  <- list(summary(lm(Y.Imp~X.Imp))$coef[,2])
>> N=4
>> for(i in (2:N)){
>>  t<-da.norm(s,t,steps=20)
>>  Imp<-imp.norm(s,t,Raw)
>>  X.Imp<-Imp[,1]; Y.Imp<-Imp[,2]
>>  Est.Imp<-c(Est.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,1]))
>>  SE.Imp <-c( SE.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,2]))
>> }
>>
>> ## Finally, combine the imputations:
>> mi.inference(Est.Imp,SE.Imp)
>>
>>
>> I've illustrated N=4 (even) above, for 4 imputations.
>>
>> Now, I run the code repeatedly from "## Set the Random Seed"
>> down to "mi.inference(Est.Imp,SE.Imp)"
>>
>> With N=4, I always get the same result.
>>
>> If I set N=5, I alternately get different results:
>> The second run is different from the first, but the
>> third is the same as the first, and the fourth is the
>> same as the second, ...
>>
>> In general, for even N, it is as for N=4, and for odd N
>> it is as for N=5.
>>
>> Any ideas???
>>
>> Thanks,
>> Ted.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 21-Sep-07                                       Time: 14:13:27
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Sep-07                                       Time: 15:12:27
------------------------------ XFMail ------------------------------