[R] family question

Douglas Bates bates at stat.wisc.edu
Thu Aug 31 06:33:13 CEST 2000


In the continuing saga of simulating family sizes from Troels Ring's
demographic problem, I offer a somewhat improved function.  In the previous
function if the process did not terminate in the current sample I
would append a new sample to it and recalculate the cumulative sum.
Obviously this is a waste because you already know that the process
will not terminate in the current sample and you know the total from
this sample.  The new function keeps track of the current total (tot)
and the family size to this point (offset) and only calculates
cumulative sums for the new part.

This function increases the size of chunks on which it is working up
to a threshold (2^20 by default) then fixes the chunk size and
continues until the total size exceeds another threshold (2^25 by
default).  I ran this simulation on cvs.r-project.org, a 400 MHz
Pentium III with 768 MB memory running Linux.  I could simulate
10000 samples in about 11 minutes.  The largest value is over
18,000,000 and there is one NA, which would have been over 32,000,000.

> fam
function(basesize = 16, maxvec = 2^20, maxoffset = 2^25)
{
    ## simulate the family sizes in Troels Ring's problem
    samp <- ifelse(runif(basesize) > 0.5, 1, -1)
    tot <- offset <- 0
    repeat {
        cs <- tot + cumsum(samp)
        moreboys <- cs > 0
        if (any(moreboys)) return(min((offset + seq(along = moreboys))[moreboys]))
        offset <- offset + length(samp)
        if (offset > maxoffset) return(NA)
        tot <- cs[length(cs)]
        samp <- ifelse(runif(min(2*length(samp), maxvec)) > 0.5, 1, -1)
    }
}
> famsamp
function(n = 10000)
{
   val <- integer(n)
   for (i in seq(along = val)) val[i] <- fam()
   val
}
> system.time(fs <- famsamp())
[1] 673.45   1.12 752.76   0.00   0.00
> summary(fs)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
       1        1        1     5091        9 18180000        1 
> sort(fs[fs > 100000])
 [1]   137031   156689   168471   190025   220981   221267   285271   379793
 [9]   423175   549397   571879   741895   766307   958423  1819621  1850911
[17]  2024143  2110585  2749441  2860301 11046833 18181117


-- 
Douglas Bates                            bates at stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list