# [R] plot - central limit theorem

Duncan Murdoch murdoch at stats.uwo.ca
Thu Oct 16 20:26:23 CEST 2008

On 10/16/2008 11:43 AM, Greg Snow wrote:
> I wonder if including the p-values for the normality test is the best approach in you animation?  The clt does not say that the distribution of the means will be normal, just that it approaches normality (and therefore may be a decent approximation).  The normality test can just reject the null that the data (simulated means) comes from a normal distribution.  Since the true distribution of the means is not normal (unless you use a sample size of Inf, and I for one have better things to than wait for a computer to simulate several samples of size Inf) the null for the normality test is always false and therefore the test will always result in either saying it is not normal or a type II error.  The real goal is not to show normality, but to show that using the normal gives a "good enough" approximation.  I would prefer the bottom plot to show either the proportion of p-values from a normal based test on the simulated data that is less than alpha, or the proportion of confid
ence intervals based on the normal based test that include the true parameter.  Then the user can see when those values become close enough an approximation.

But the p-value is not the test.  The test comes later, when you
interpret the p-value.  So there's no such thing as a Type II error in
a p-value.  The demo does show that for n < 20 (or whatever), the test
is very likely to reject the null.  After that, it becomes less and less
likely.

My suggestion (and this is a matter of taste) would be to do the tests
independently, rather than using the same dataset plus new observations
each time.  It is hard to understand the behaviour of p-values even
without complicating things by giving a correlated sequence of them.

And this is even more a matter of taste:  I'd plot the p-values as
points, not as vertical bars.  Showing that a p-value of 0.8 is twice as
big as a p-value of 0.4 isn't useful for interpreting them.

Duncan Murdoch

>
> What is your target audience for this demo?  In my opinion, anyone who could understand the bottom plot should already understand the clt enough not to need the demo, those that I would aim the demo at would just be confused by the current bottom plot.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Yihui Xie
>> Sent: Wednesday, October 15, 2008 10:51 PM
>> To: roger koenker
>> Cc: r-help
>> Subject: Re: [R] plot - central limit theorem
>>
>> it later.
>>
>> I've also made a demo for the CLT in my package 'animation', in which
>> there's also normality testing for the sample means, because I don't
>> think "bell-shaped" alone means normality - so I performed the
>> Shapiro-Wilk test and plotted the P-values under the demo. See the
>> function clt.ani() in the package 'animation', or
>> http://animation.yihui.name/prob:central_limit_theorem
>>
>> You can use any function to denote the population (specify the
>> argument 'FUN') in clt.ani().
>>
>> Regards,
>> Yihui
>> --
>> Yihui Xie <xieyihui at gmail.com>
>> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
>> Mobile: +86-15810805877
>> Homepage: http://www.yihui.name
>> School of Statistics, Room 1037, Mingde Main Building,
>> Renmin University of China, Beijing, 100872, China
>>
>>
>>
>> On Thu, Oct 16, 2008 at 4:22 AM, roger koenker <rkoenker at uiuc.edu>
>> wrote:
>> > Galton's 19th century mechanical version of this is the quincunx.  I
>> have a
>> > (very primitive) version of this for R at:
>> >
>> >
>> http://www.econ.uiuc.edu/~roger/courses/476/routines/quincunx.R
>> >
>> >
>> > url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> > email    rkoenker at uiuc.edu            Department of Economics
>> > vox:     217-333-4558                University of Illinois
>> > fax:       217-244-6678                Champaign, IL 61820
>> >
>> >
>> >
>> >> Jörg Groß wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>>
>> >>> Is there a way to simulate a population with R and pull out m
>> samples,
>> >>> each with n values
>> >>> for calculating m means?
>> >>>
>> >>> I need that kind of data to plot a graphic, demonstrating the
>> central
>> >>> limit theorem
>> >>> and I don't know how to begin.
>> >>>
>> >>> So, perhaps someone can give me some tips and hints how to start
>> and
>> >>> which functions to use.
>> >>>
>> >>>
>> >>>
>> >>> thanks for any help,
>> >>> joerg
>> >>>
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help