[R] Help with three-way anova

John Fox jfox at mcmaster.ca
Tue Apr 5 17:39:27 CEST 2005


Dear Michael,

For unbalanced data, you might want to take a look at the Anova()
function in the car package.

As well, it probably makes sense to read something about how linear
models are expressed in R. ?lm and ?formula both have some information
about model formulas; the Introduction to R manual that comes with R
has a chapter on statistical models; and books on R typically take up
the subject at greater length.

I hope this helps,
 John 

On Tue, 5 Apr 2005 15:51:46 +0100
 "michael watson \(IAH-C\)" <michael.watson at bbsrc.ac.uk> wrote:
> Hi
> 
> I have data from 12 subjects.  The measurement is log(expression) of
> a
> particular gene and can be assumed to be normally distributed.  The
> 12
> subjects are divided into the following groups:
> 
> Infected, Vaccinated, Lesions - 3 measurements
> Infected, Vaccintaed, No Lesions - 2 measurements
> Infected, Not Vaccinated, Lesions - 4 measurements
> Uninfected, Not Vaccinated, No Lesions - 3 measurements
> 
> Although presence/absence of lesions could be considered to be a
> phenotype, here I would like to use it as a factor.  This explains
> some
> of the imbalance in the design (ie we could not control how many
> subjects, if any, in each group would get lesions).
> 
> First impressions - the data looks like we would expect.  Gene
> expression is lowest in the infected/not vaccinated group, then next
> lowest is the infected/vaccinated group and finally comes the
> uninfected/not vaccinated group.  So the working hypothesis is that
> gene
> expression of the gene in question is lowered by infection, but that
> the
> vaccine somehow alleviates this effect, but not as much as to the
> level
> of a totally uninfected subject.  We *might* have access to data
> relating to uninfected/vaccinated group, my pet scientist is digging
> for
> this as we speak.
> 
> As for lesions, well none of the uninfected subjects have them, all
> of
> the infected/not vaccinated subjects have them, and some of the
> infected/vaccinated have them, some don't.  Again, this makes for a
> very
> sensible hypothesis if we treat presence/absence of lesions as a
> phenotype, but in addition to that I want to know if gene expression
> is
> linked to presence/absence of lesion, but only one group of subjects
> has
> both lesions and non-lesions within it.  Eye-balling this group,
> presence/absence of lesions and gene expression are not linked.
> 
> So I have this as a data.frame in R, and I wanted to run an analysis
> of
> variance.  I did:
> 
> aov <-  aov(IL.4 ~ Infected + Vaccinated + Lesions, data)
> summary(aov)
> 
> And got:
> 
>             Df  Sum Sq Mean Sq F value    Pr(>F)    
> Infected     1 29.8482 29.8482 66.7037 3.761e-05 ***
> Vaccinated   1 13.5078 13.5078 30.1868 0.0005777 ***
> Lesions      1  0.0393  0.0393  0.0878 0.7746009    
> Residuals    8  3.5798  0.4475                      
> ---
> 
> This tells me that Infected and Vaccinated are highly significant,
> whereas lesions are not.
> 
> So, what I want to know is:
> 
> 1) Given my unbalanced experimental design, is it valid to use aov?
> 2) Have I used aov() correctly?  If so, how do I get access results
> for
> interactions?
> 3) Is there some other, more relevant way of analysing this?  What I
> am
> really interested in is the gene expression, and whether it can be
> shown
> to be statistically related to one or more of the factors involved
> (Infected, Vaccinated, Lesions) or interactions between those
> factors.
> 
> Many thanks in advance
> 
> Mick
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/




More information about the R-help mailing list