[R] A reproducibility puzzle with NORM

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Sep 21 15:44:40 CEST 2007


Norm uses a Box-Muller normal RNG, and rngseed does not reset its state
(it has some Fortran save variables).  So if you ask for an odd number of 
normals and call rngseed, the next normal 'generated' is the second half 
of the last pair with the previous seed.

Ideally packages should be converted to use R's number generators which do 
not have such quirks.

On Fri, 21 Sep 2007, Ted.Harding at manchester.ac.uk wrote:

> Hi Folks,
> I'm using the 'norm' package (based on Shafer's NORM)
> on some  data. In outline, (X,Y) are bivariate normal,
> var(X)=0.29, var(Y)=24.4, cov(X,Y)=-0.277,
> there are some 900 cases, and some 170 values of Y
> have been set "missing" (NA).
>
> The puzzle is that, repeating the multiple imputation
> starting from the same random seed, I get different
> answers from the repeats depending if I do an odd number
> of imputations, but the same answer on the repeats
> if I do en even number (which includes the second
> repeat of an odd number).
>
> It may possibly have something to do with how I've
> written the code for the loop, but if so then I'm
> not seeing it!
>
> CODE:
>
> ## Set up the situation:
> Data<-read.csv("MyData.csv")
> X<-Data$X; Y<-Data$Y
> ##(If you want to try it, set your own data here)
> Raw<-cbind(X,Y)
> library(norm)
>
> ## Initialise stuff
> s<-prelim.norm(Raw)
> t0<-em.norm(s)
>
> ##########################
> ## Set the Random Seed
> rngseed(31425)
>
> ## Do the first imputation:
> t     <-  da.norm(s,t0,steps=20)
> Imp   <- imp.norm(s,t, Raw)
> X.Imp <- Imp[,1]; Y.Imp<-Imp[,2]
>
> ## Now do the rest, and accumulate lists of the results
> ## Est.Imp = list of estimated coeffs
> ## SE.Imp  = list of SEs of estimated coeffs:
> Est.Imp <- list(summary(lm(Y.Imp~X.Imp))$coef[,1])
> SE.Imp  <- list(summary(lm(Y.Imp~X.Imp))$coef[,2])
> N=4
> for(i in (2:N)){
>  t<-da.norm(s,t,steps=20)
>  Imp<-imp.norm(s,t,Raw)
>  X.Imp<-Imp[,1]; Y.Imp<-Imp[,2]
>  Est.Imp<-c(Est.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,1]))
>  SE.Imp <-c( SE.Imp,list(summary(lm(Y.Imp~X.Imp))$coef[,2]))
> }
>
> ## Finally, combine the imputations:
> mi.inference(Est.Imp,SE.Imp)
>
>
> I've illustrated N=4 (even) above, for 4 imputations.
>
> Now, I run the code repeatedly from "## Set the Random Seed"
> down to "mi.inference(Est.Imp,SE.Imp)"
>
> With N=4, I always get the same result.
>
> If I set N=5, I alternately get different results:
> The second run is different from the first, but the
> third is the same as the first, and the fourth is the
> same as the second, ...
>
> In general, for even N, it is as for N=4, and for odd N
> it is as for N=5.
>
> Any ideas???
>
> Thanks,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 21-Sep-07                                       Time: 14:13:27
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list