[R] Shapiro-Wilk cpoefficients: 2 Qs

(Ted Harding) Ted.Harding at wlandres.net
Thu Apr 5 10:11:55 CEST 2012


[See at end]
On 05-Apr-2012 00:34:30 Peter Ehlers wrote:
> Hi Ted,
> 
> On 2012-04-04 15:06, Ted Harding wrote:
>> Greetings!
>> I want to have the coefficients that R uses in shapiro.test()
>> for the Shapiro-Wilk test for a prticular sample size, i.e.
>> the a[i] in
>>
>>    W = Sum(a[i]*x[i])/(Sum(x[i] - mean(x))^2)
>>
>> (where the x[i] are sorted). Two questions:
>>
>> Q1:
>> Is there a readymade R function from which I can extract these?
>>
>> Q2:
>> I was wondering if I might be able to modify the code for the
>> function shapiro.test() so as to obtain these. When I enter
>>
>>    shapiro.test
>>
>> I get:
>>
>> function (x)
>> {
>>      DNAME<- deparse(substitute(x))
>>      x<- sort(x[complete.cases(x)])
>>      stopifnot(is.numeric(x))
>>      n<- length(x)
>>      if (n<  3 || n>  5000)
>>          stop("sample size must be between 3 and 5000")
>>      rng<- x[n] - x[1L]
>>      if (rng == 0)
>>          stop("all 'x' values are identical")
>>      if (rng<  1e-10)
>>          x<- x/rng
>>      n2<- n%/%2L
>>      sw<- .C(R_swilk, init = FALSE, as.single(x), n, n1 = n,
>>          n2, a = single(n2), w = double(1), pw = double(1), ifault =
>>          integer(1L))
>>      if (sw$ifault&&  sw$ifault != 7)
>>          stop(gettextf("ifault=%d. This should not happen", sw$ifault),
>>              domain = NA)
>>      RVAL<- list(statistic = c(W = sw$w), p.value = sw$pw, method =
>> "Shapiro-Wilk normality test",
>>          data.name = DNAME)
>>      class(RVAL)<- "htest"
>>      return(RVAL)
>> }
>> <environment: namespace:stats>
>>
>>
>> So, on the off-chance that the variable 'sw' computed therein might
>> contain something useful, I changed "return(RVAL)" to "return(sw)",
>> just in case the coefficients might be lurking as a component of sw,
>> and used this to define a function SW_ted(). I then ran
>>
>>    SW_ted(rnorm(30))
>>    # Error in SW_ted(rnorm(30)) : object 'R_swilk' not found
>>
>> Since shapiro.test(rnorm(30)) works perfectly, and since the
>> "stats:" namespace is already present, I am wondering why
>> "object 'R_swilk' not found" when it clearly can be found by
>> shapiro.test().
>>
>> So why is it that in the ".C" call:
>>
>>    sw<- .C(R_swilk, ... )
>>
>> my modifiction of shapiro.test() doesn't find it?
>>
>> (No doubt there is some dumb oversight behind this, but I'd be
>> grateful to be told what it is)!
> 
> No dumb oversight that I can see, but:
> Q1: I have no idea, but I doubt that there is an existing function.
> 
> Q2: I think that the environment of the newly created function
>      SW_ted needs to be set appropriately; try this:
> 
>    environment(SW_ted) <- environment(shapiro.test)
> 
> More information should be available from the constants used in
> swilk.c (in the R sources) which is a translation of the Fortran
> code at http://lib.stat.cmu.edu/apstat/R94.
> 
> Peter Ehlers
> 
>>
>> With thanks,
>> Ted.

Thanks, Peter! Your suggestion
  environment(SW_ted) <- environment(shapiro.test)
got it working, and now I can see that the variable 'sw'
has six components, of which shapiro.test() returns only two:
  statistic = c(W = sw$w), p.value = sw$pw

With X <- rnorm(30), SW <- SW_ted(X), one component, SW$init,
is itself a list which includes a vector of 30 numbers which
are the same as sort(X).

Another is SW$a, a vector of 15 numbers:

$a
 [1] 0.41668701 0.30027449 0.25057057 0.21649644 0.18856336 0.16440846
 [7] 0.14279744 0.12299577 0.10452516 0.08705221 0.07033122 0.05417194
[13] 0.03842017 0.02294516 0.00763082

and I strongly suspect that these, in effect, will give the coefficients.
Now to check it out!

Thanks again, and best wishes,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 05-Apr-2012  Time: 09:11:51
This message was sent by XFMail



More information about the R-help mailing list