[R] Normality test

Tue May 31 03:31:03 CEST 2011

I was referring to the height of the pdf.  The 1st distribution is the simple uniform between 0 and 1, the second is also uniform, but on a discontinuous region.  The first will generate numbers between 0 and 1 with equal probability, the second will be similar but only up to 0.99 with the very rare value between 999.99 and 1000.

-----Original Message-----
From: Bogaso Christofer [mailto:bogaso.christofer at gmail.com] 
Sent: Sunday, May 29, 2011 12:15 AM
To: Greg Snow; R-help at r-project.org
Subject: RE: [R] Normality test

Hi Greg, please forgive me as I could not understand one part of your
wishful reply. You said: "distributions where one is uniform between 0 and 1
with height 1; the other also has height 1 between 0 and 0.99, but is also 1
between 999.99 and 1000, zero elsewhere. " Can you be more specific on this
2nd distribution? And what you mean to say as "height?"

Thanks for your time.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Greg Snow
Sent: 29 May 2011 01:52
To: Robert Baer; Salil Sharma; R-help at r-project.org
Subject: Re: [R] Normality test

To build on Robert's suggestion (which is very good to begin with), you
might consider using the vis.test function in the TeachingDemos package with
the vt.qqnorm function.  This will create the qq plot of your data along
with several other qqplots of normal samples of the same size.  If you
cannot tell which of the plots is your data, then your data is probably
close enough to normal for most practical purposes.  It will give you a
p-value based on your ability to distinguish your data from random normals
if you need one.

If you need more precision, then the most precise normality test is
SnowsPenultimateNormalityTest also in TeachingDemos.  However, the
documentation for that function tends to be more useful than the function
itself.

If you really want to choose among the different normality tests in nortest
(or elsewhere) then you should really investigate what assumptions they are
making and what types of alternatives they are the most powerful for.  Also
decide on what types of non-normality you really care about, then use that
to choose among them.  Consider the 2 distributions where one is uniform
between 0 and 1 with height 1; the other also has height 1 between 0 and
0.99, but is also 1 between 999.99 and 1000, zero elsewhere.  Are these 2
distributions different in a meaningful way?  They have very different mean
and variance, but for most samples they will look the same (and if you throw
out outliers they will look even more similar).  The reason that different
tests give different results is because they focus on different types of
differences.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Robert Baer
Sent: Friday, May 27, 2011 5:28 PM
To: Salil Sharma; R-help at r-project.org
Subject: Re: [R] Normality test

> I am writing to inquire about normality test given in nortest package. 
> I have a random data set consisting of 300 samples. I am curious about 
> which normality test in R would give me precise measurement, whether 
> data sample is following normal distribution. As p value in each test 
> is different in each test, if you could help me identifying a suitable 
> test in R for this medium size of data, it will be grateful.

I am neither a statistician nor an expert on these types of tests, but I'm
guessing  that your are unlikely to get a good answer even from people with
such qualifications as such judgments can only be made in the context of a
specific problem.  You have not provided us with such a problem (please read
the posting guide).

That admonishment aside, I typically start by using qqnorm() and qqline() to
plot my data against the expected theoretical quantiles.  If your data is
perfectly normal, the points will fall right along the line.  Skewness and
deviations from normal by the tails produce very characteristic patterns in
the plots which you can learn about by plotting some simulated data that is
left-skewed, right-skewed, long tailed, or short tailed.

I personally find this graphical feedback to be a much more useful way to
understand my data than doing a single normality test that produces a
p-value. based upon assumptions I may not be privy to

For more, see the help by typing:
?qqnorm
?qqline

Rob

------------------------------------------
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine A. T. Still University of Health
Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.