[R] How to run Shapiro-Wilk test for each grouped variable?

David Winsemius dwinsemius at comcast.net
Fri Apr 9 19:28:51 CEST 2010


On Apr 9, 2010, at 10:51 AM, Iurie Malai wrote:

> Thank you, David!
>
> Here is the code to read my file:
>> data <- read.table("data.txt", header=TRUE, sep=";",  
>> na.strings="NA", dec=".", strip.white=TRUE)
>
> Jorge Ivan Velez gave me a working solution, but I am ready to learn  
> yours to.

I don't think I want to play anymore. Running Jorge's code seemed at  
first to be pretty good evidence that doing such an investigation is  
prone to very misleading results to which I would not want to expose  
the unwary. Only one of those thirty "tests of normality" on what  
appeared at first glance to be "normal" data actually accepted the  
Null Hypothesis.

(That arose because he only selected 100 normal values and then  
replicated them across 10 and 100 rows and columns. You can prove this  
by table(unlist(d[,-1]) ), so I suppose the widespread rejection could  
be considered a proper result. Notice that d[11,2] == d[1,20]  )

Automating the task of testing for normality reminds me of the methods  
I was forced to use in Green Belt class, although their favorite  
"normality" statistic was the Anderson-Darling test. I had by that  
point decided to bite my tongue because the Black Belt instructors  
were rather annoyed at hearing my objections and pained reactions to  
their version of statistics.

-- 
David.


>
> Iurie
>
> 2010/4/9 David Winsemius <dwinsemius at comcast.net>:
>> OK, we have the data, now ... where is the code that you used to  
>> read that
>> data? It is labeled as a csv file but does not have commas as  
>> separators.
>> Post any follow-ups to the r-help list. I do not offered offlist  
>> consulting.
>>
>> When you post data to the list it needs to have a file extension of  
>> ".txt"
>>
>> --
>> David
>>
>> On Apr 9, 2010, at 10:08 AM, Iurie Malai wrote:
>>
>>> I attached a file with data and corrected in the working commands
>>> grouping factor name:
>>>
>>>> data.n<-names(data)  # put names into a vector called data.n
>>>> by(eval(parse(text=(paste("data",data.n[3],sep="$")))), data 
>>>> $groupFactor,
>>>> shapiro.test)  # run shapiro.test
>>>
>>> and not working:
>>>
>>>> for (r in 3:18) {
>>>> by(eval(parse(text=(paste("data",data.n[3],sep="$")))), data 
>>>> $groupFactor,
>>>> shapiro.test)
>>>> }
>>>
>>>
>>> 2010/4/9 David Winsemius <dwinsemius at comcast.net>:
>>>>
>>>> On Apr 9, 2010, at 8:16 AM, Iurie Malai wrote:
>>>>
>>>>> I want to run Shapiro-Wilk test for each variable in my dataset,  
>>>>> each
>>>>> grouped by variable groupFactor.
>>>>> I have these working commands:
>>>>>
>>>>>> data.n<-names(data) # put names into a vector called data.n
>>>>>> by(eval(parse(text=(paste("data",data.n[3],sep="$")))), data 
>>>>>> $factor,
>>>>>> shapiro.test) #run shapiro.test
>>>>>
>>>>> but I must to change the variable number manualy. How to  
>>>>> automate this?
>>>>>
>>>>> I tried this:
>>>>>
>>>>>> for (r in 3:18) {
>>>>>> by(eval(parse(text=(paste("data",data.n[3],sep="$")))),
>>>>>> data$groupFactor,
>>>>>> shapiro.test)
>>>>>> }
>>>>
>>>> Not able to test since you have provided code that works with  
>>>> data that
>>>> is
>>>> not available. Inside for loops one needs either to make an  
>>>> assignment or
>>>> print the results. Had the data been available I would have wrapped
>>>> print()
>>>> around the full by expression to see if my hypothesis could be  
>>>> tested.
>>>>
>>>> --
>>>> David.
>>>>>
>>>>> but not working and no errors. Why?
>>>>>
>>>>> Please help.
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Iurie Malai, Senior Lecturer
>>>>> Department of Psychology
>>>>> Faculty of Psychology and Special Education
>>>>> Ion Creanga Moldova Pedagogical State University - www.upsm.md
>>>>>
>>>>> http://en.wikipedia.org/wiki/Ion_Creang%C4%83_Pedagogical_State_University
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>>
>>> <data.csv>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
>
>
> -- 
> Regards,
> Iurie Malai, Senior Lecturer
> Department of Psychology
> Faculty of Psychology and Special Education
> Ion Creanga Moldova Pedagogical State University - www.upsm.md
> http://en.wikipedia.org/wiki/Ion_Creang%C4%83_Pedagogical_State_University
> <data.txt>______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list