[R] Using R to illustrate the Central Limit Theorem

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Fri Apr 22 14:10:19 CEST 2005


On 21-Apr-05 Bill.Venables at csiro.au wrote:
> Here's a bit of a refinement on Ted's first suggestion.
> [ corrected from runif(M*k), N, k) to runif(N*k), N, k) ]
>  
>  N <- 10000
>  graphics.off()
>  par(mfrow = c(1,2), pty = "s")
>  for(k in 1:20) {
>     m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
>     hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>             prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>     pu <- par("usr")[1:2]
>     x <- seq(pu[1], pu[2], len = 500)
>     lines(x, dnorm(x), col = "red")
>     qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
>     abline(0, 1, col = "red")
>     Sys.sleep(1)
>   }

Very nice! (I can better keep up with it mentally, though, with
Sys.sleep(2) or Sys.sleep(3), which moght be better for classroom
demo).

One thing occurred to me, watching it: people might say "Yes,
we can see how the distribution -> Normal, nice and smooth,
especially in the tails and side-arms; but the peaks always look
a bit rough."

Which could be the cue for introducing "SD(ni) = sqrt(E[ni])",
and the following hack of the above code seems to show this OK
in the "rootograms":

N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:20) {
   m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
   hm <- hist(m, breaks = "FD", xlim = c(-4,4), main = k, plot=FALSE,
           prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
   hm$counts<-sqrt(hm$counts) ; 
   plot(hm,xlim = c(-4,4),main = k,ylab="sqrt(Frequency)")
   pu <- par("usr")[1:2]
   x <- seq(pu[1], pu[2], len = 500)
   lines(x, sqrt(N*dnorm(x)*(hm$breaks[2]-hm$breaks[1])), col = "red")
   qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
   abline(0, 1, col = "red")
   Sys.sleep(2)
}

(and also shows clearly how the tails of the sample move outwards
into the tails of the Normal, as in fact you expect from the finite
range of mean(runif(k)), especially initially: very visible for
k up to about 5, and not really settled down for k<10).

Next stop: Hanging rootograms!

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Apr-05                                       Time: 13:10:19
------------------------------ XFMail ------------------------------




More information about the R-help mailing list