[R] plot - central limit theorem

Sun Oct 19 14:32:57 CEST 2008

I don't know whether showing p-values is the best approach either, but
I'm using them only as indicators to show how good the approximation
would be as the sample size increases. You may regard the p-values as
a measure of goodness of fit. I don't think I need to answer the
question of hypothesis test -- as Duncan has explained.

Yes you can generate normal random numbers in the mean time and
compare the p-values, but I prefer comparing the sample means with the
theoretical population distribution instead of simulated normal random
numbers.

The problem with most demos in CLT is we have no means to observe how
good is the approximation. In your clt.examp(), there is a graphical
measure, i.e. comparing the density curve to the histogram, but that's
not sufficient, as sometimes our eyes cannot easily detect differences
between curves, e.g. the t-distribution and normal distribution.
That's why I use numerical measures like p-values.

P. S. I think your code in clt.examp() needs a correction: the
parameters of the theoretical normal distribution should not be
computed by *simulated* means & variances, but from original
theoretical distribution. For example, for the uniform distribution
over (a, b), mean = (a+b)/2 and sd=(b-a)/sqrt(12*n) (although in the
case of large sample sizes these results will be very close)

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China

On Thu, Oct 16, 2008 at 11:43 PM, Greg Snow <Greg.Snow at imail.org> wrote:
> I wonder if including the p-values for the normality test is the best approach in you animation?  The clt does not say that the distribution of the means will be normal, just that it approaches normality (and therefore may be a decent approximation).  The normality test can just reject the null that the data (simulated means) comes from a normal distribution.  Since the true distribution of the means is not normal (unless you use a sample size of Inf, and I for one have better things to than wait for a computer to simulate several samples of size Inf) the null for the normality test is always false and therefore the test will always result in either saying it is not normal or a type II error.  The real goal is not to show normality, but to show that using the normal gives a "good enough" approximation.  I would prefer the bottom plot to show either the proportion of p-values from a normal based test on the simulated data that is less than alpha, or the proportion of confidence intervals based on the normal based test that include the true parameter.  Then the user can see when those values become close enough an approximation.
>
> What is your target audience for this demo?  In my opinion, anyone who could understand the bottom plot should already understand the clt enough not to need the demo, those that I would aim the demo at would just be confused by the current bottom plot.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>