[R] shapiro.test() output

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Jul 13 00:01:07 CEST 2006


<Matthew.Findley at ch2m.com> writes:

> R Users:
> 
> My question is probably more about elementary statistics than the
> mechanics of using R, but I've been dabbling in R (version 2.2.0) and
> used it recently  to test some data . 
> 
> I have a relatively small set of observations (n = 12) of arsenic
> concentrations in background groundwater and wanted to test my
> assumption of normality.  I used the Shapiro-Wilk test (by calling
> shapiro.test() in R) and I'm not sure how to interpret the output.
> Here's the input/output from the R console:
> 
> 	>As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13)
> 	>shapiro.test(As)
> 
>       	  Shapiro-Wilk normality test
> 
> 	data:  As 
> 	W = 0.9513, p-value = 0.6555
> 
> How do I interpret this?  I understand, from poking around the internet,
> that the higher the W statistic the "more normal" the data.
> 
> What is the null hypothesis - that the data is normally distributed?  

Yup.

> What does the p-value tell me?  65.55% chance of what - getting
> W-statistic greater than or equal to 0.9513 (I picked this up from the
> Dalgaard book, Introductory Statistics with R, but its not really
> sinking in with respect to how it applies to a Shipiro Wilk test).? 

*Smaller* or equal - W=1.0 is the "perfect fit". The W statistic is
 pretty much the Pearson correlation applied to the curve drawn by
 qqnorm(). (The exact definition of what goes on the x axis differs
 slightly, I believe.) 

A low p-value would indicate that the W is too extreme to be explained
by chance variation - i.e. evidence against normal distribution.
In the present case you have no evidence against normal distribution
(beware that this is not evidence _for_ normality).

(Personally, I'm not too happy about these normality tests. They tend
to lack power in small samples and in large samples they often reject
distributions which  are perfectly adequate for normal-theory
analysis. Learning to evaluate a QQ plot seems a better idea.) 

 
> The method description - retrieved using ?shapiro.test() - is a bit
> light on details.

There are references therein, though...

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list