[R] Help With ANOVA (corrected please ignore last email)

Joshua Wiley jwiley.psych at gmail.com
Tue Jul 6 20:18:28 CEST 2010


Hello,

Are you saying that the -4.60517 values are supposed to be treated as
missing?  Unless you set them to NA in R, they will be treated as real
values.  This would make a huge difference.

I can tell you that your formula: aov(Intensity ~ Group, data =
zzzanova) is treating the variable 'Intensity' as the outcome (or
dependent variable) and the variable 'Group' as the predictor (or
independent variable).  But of course, whether you are using an ANOVA
correctly depends on a lot more.  Oh, and yes, Pr(>F) is the p-value.

However, I may have found the source of your discrepancy.  Looking at
the data you provided:

> str(zzzanova)
'data.frame':	54 obs. of  3 variables:
 $ Intensity: num  -4.61 -4.61 -4.61 -4.61 -4.61 ...
 $ Group    : Factor w/ 6 levels "Group1","Group2",..: 1 1 1 1 1 1 1 1 1 2 ...
 $ Sample   : num  1 2 3 4 5 6 7 8 9 10 ...

Notice that 'Group' is a factor with 6 levels.  That means that R is
not treating this variable's scale as interval or ratio variable.  In
fact, it is being treated as nominal.  Looking at the summary of the
ANOVA:

> summary(aov(Intensity ~ Group, data = zzzanova))
            Df Sum Sq Mean Sq F value  Pr(>F)
Group        5  98.85  19.771  2.1469 0.07576 .
Residuals   48 442.03   9.209
---

indicates that the variable 'Group' has 5 degrees of freedom, which
makes sense given that it has 6 levels.  Now look what happens when I
convert the variable 'Group' from a factor to a numeric variable
(using as.numeric() on the variable in the formula).

> summary(aov(Intensity ~ as.numeric(zzzanova$Group), data = zzzanova))
                           Df Sum Sq Mean Sq F value   Pr(>F)
as.numeric(zzzanova$Group)  1  80.18  80.182  9.0503 0.004042 **
Residuals                  52 460.70   8.860

Now Group only has 1 degree of freedom (because it is being treated as
a single predictor now instead of having multiple levels).  Also the
new p-value seems within rounding error of what you expected.


All that said, my guess is that you really do want to treat the
variable 'Group' as a factor.  Treated as a numeric variable what you
are basically testing is the relationship between Group and Intensity
as group increases from 1 to 6.  This is very different from testing
the relationship between 6 different groups and Intensity.

As a disclaimer, that is not really the best explanation of what is
going on, but every other way I tried to say it seemed worse, so if
you are unclear you might consult a statistics text or perhaps others
can give a more erudite explanation.


So to summarize, take a look at the original data for Group that you
pulled into R.  If you do not want it to be a factor you should either
convert it somehow, or read the data in differently.  This may not be
the actual issue, but it was the closest I could come to replicating
your p-value.

Best regards,

Josh


On Tue, Jul 6, 2010 at 10:09 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:
> Hi Joshua
>
> Thanks very much for your help. I will take your advice and work on my problem. I am getting my expected p-values from GeneSpring software. I thought the problem lied in the treatment of the NA (-4.60517) values and maybe how they are treated in the analysis (i.e ignored or -4.60517). Im not sure what your ANOVA background is but I just wanted to check that I'm using ANOVA correctly. Im using the assumption that the Pr(>F)1 value is the actual p-value.
>
> Thanks again for your help
>
>
>
> ----- Original Message ----
> From: Joshua Wiley <jwiley.psych at gmail.com>
> To: Amit Patel <amitrhelp at yahoo.co.uk>
> Cc: r-help at r-project.org
> Sent: Tue, 6 July, 2010 16:46:30
> Subject: Re: [R] Help With ANOVA (corrected please ignore last email)
>
> Hi Amit,
>
> When I copy in your data and run
>
> aov(Intensity ~ Group, data = zzzanova)
>
> I get neither the p-value you showed nor the one you expected.  My
> suggestions at things to look at would be
>
> 1) Where/How did you get the expected p-value?  Another statistics
> program (e.g., SPSS or SAS)?  It helps to know where the expected
> value came from.  If it was from another program, I would also
> recommend searching the R-help archives (for instance for SPSS and
> ANOVA)
>
> 2) Perhaps something is happening with the unbalanced design (e.g.,
> aov() in R handles it differently than you expected)?
>
> 3) You only mention that the p-values do not match.  What about other
> aspects?  Are the df the same?  I would try to be certain that the
> data in R is the same as what was used to calculate the expected
> p-value.  summary(zzzanova) will return some nice summary statistics
> for each column of your dataframe.
>
> Cheers,
>
>
> Josh
>
>
>
> On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:
>> Sorry i had a misprint in the appendix code in the last email
>>
>>
>> Hi I needed some help with ANOVA
>>
>> I have a problem with My ANOVA
>> analysis. I have a dataset with a known ANOVA p-value, however I can
>> not seem to re-create it in R.
>>
>> I have created a list (zzzanova) which contains
>> 1)Intensity Values
>> 2)Group Number (6 Different Groups)
>> 3)Sample Number (54 different samples)
>> this is created by the script in Appendix 1
>>
>> I then conduct ANOVA with the command
>>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>>
>> I get a p-value of
>> Pr(>F)1
>> 0.9483218
>>
>> The
>> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
>> or have put in a wrong formula. I am trying to do an ANOVA analysis
>> across all 6 Groups. Is there something wrong with my formula. But I think I
>> have made a mistake in the formula rather than anything else.
>>
>>
>>
>>
>> APPENDIX 1
>>
>> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517,
>>    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517,
>>    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127,
>>    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118,
>>    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  0.023703, -4.60517,
>>    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227)
>>
>> "zzzanova" <-
>> structure(list(Intensity = datalist,
>> Group = structure(c(1,1,1,1,1,1,1,1,1,
>>         2,2,2,2,2,2,2,2,
>>         3,3,3,3,3,3,3,3,3,
>>         4,4,4,4,4,4,4,4,4,4,
>>         5,5,5,5,5,5,5,5,5,
>>         6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", "Group4", "Group5", "Group6"), class = "factor"),
>>    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>>    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>>    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>>    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
>> ))
>> , .Names = c("Intensity",
>> "Group", "Sample"), row.names =
>> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
>> "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
>> "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
>> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
>> "41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
>> "51", "52", "53", "54"),class = "data.frame")
>>
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
>
>
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list