Re: [BioC] F-tests for factorial effects - limma

Ramon Diaz-Uriarte rdiaz at cnio.es
Fri Jan 7 13:25:20 CET 2005


Thanks Gordon and Naomi for the discussion.

I understand the example that Gordon gives, and the discussion and analyses in 
p. 30 and ff. of the limma user's guide.  I take seriously the good old 
advice of not even looking at main effects and their estimates in the 
pressence of interactions. 

However, in many cases it is necessary to address the isues of what we do when 
there are/aren't interactions and how we obtain estimates of main effects 
over levels of another factor. For instance, a growing concern with many 
microarray results in cancer research is that these are observational data,  
samples collected from patients that, annoyingly, differ in traits such as 
sex, age, diet, exercise, etc, which can affect gene expression. But these 
microarray data are often analized just for differences between cancer status 
without due considertaion to variation in other factors. J.D. Potter (and 
others) have made this point recently in several papers, where it is argued 
that many of the differences between cancer types seen in microarray studies 
could be the result of confounding by age and sex. 

So now suppose we are comparing gene expression in women with and without 
breast cancer, and want to include possible age effects (this is already much 
too simple a scenario). In a case such as this I think that a very reasonable 
biological question is to try to find those genes that show a difference 
between cancer and non-cancer patients, AND do NOT show an interaction 
age*cancer status. The estimate of interest for gene difference is usually an 
estimate over all ages.

A possible approach might be to carry out first a test of interaction. Then, 
use a very liberal criterion for "significance of interaction" (e.g., 
unadjusted p < 0.1). Finally, fit a model without interaction only to those 
genes that show no evidence of interaction, and obtain estimates and p-values 
of main effects (the liberal p-value for interaction would be a way of trying 
to overcome low power for detecting interactions).

I see two main problems here. One is the "pre-test estimation" issue (see 
recent thread about this issue by F. Hong, N. Altman and G. Smyth), but we 
definitely do not want to obtain the estimates of effect from a model with 
interaction (as Gordon mentions in his second paragraph). The second problem 
is that it is not obvious to me how to adjust the p-values from the second 
set of analyses. I think some of the recent work of Y. Benjamini on D. 
Yekutieli on hierarchical FDR on trees of hypotheses mighe help here.


R.



On Friday 07 January 2005 02:12, Gordon Smyth wrote:
> Why don't I find main effects of interest? Let's suppose you're considering
> genotype*treatment and the genotypes are wt and ko (knockout). My
> experience is that an experimenter wants to know which genes respond to the
> treatment in the wt, which genes respond in the ko, and which respond
> differently between the two genotypes (the interaction). So the experiment
> wants two separate lists and want to compare and constrast these lists. But
> the factorial main effect for the treatment refers to the *average*
> response for a each gene averaged over wt and ko. So it is not a real
> effect for any given genotype. I've never seen a microarray context where
> the the experimenter wanted to know responses averaged over different and
> essentially different genotypes. Of course one may want an effect averaged
> over different cell lines or individuals from the same genotype, but not
> over different genotypes which are essentially different. That's my take on
> it. I would be interested in different opinions. I've written briefly on
> this in the Factorial Design section on page 30 of the User's Guide at
> http://bioinf.wehi.edu.au/limma/usersguide.pdf
>
> The reason that main effects are computationally different to interactions
> is that I have to treat all microarray experiments as unbalanced designs. I
> have to do this because I have to allow for missing values or spot quality
> weights in the case of two-colour data and for weights arising from robust
> estimation or affyPLM standard errors in the case of affy data. This means
> that the anova break-down is a sequential ANOVA table with first main
> effect adjusted for intercept, second main effect adjusted for the first,
> and interaction adjusted for all main effects. In the case of the
> interaction one can just fit the full model and then test that the
> coefficients are zero. With main effects one can't do this because that
> would be equivalent to adjusting the main effects for the interaction which
> is not what is wanted. This means that to get the main effect SS you
> actually have to fit three or four different models to get the sequential
> SS. Of course you could othogonalize all the coefficients to solve the
> problem, but this would have to be done on a gene-wise basis to allow for
> the possibility of gene-specific weights.
>
> Gordon
>
> At 01:14 AM 23/12/2004, Naomi Altman wrote:
> >Thanks, Gordon.  We will try contrasts.fit.  If that does not give us what
> >we want, I will consider writing an aov.series routine.
> >
> >The microarray experiments I see are often factorial, RCB, or split plot
> >designs.  Particularly with Affy arrays, they are often balanced.  I guess
> >I am used to thinking in terms of "effects" rather than individual
> >contrasts, but why would the main effect and interaction idea be different
> >for microarray experiments than for other experiments?
> >
> >e.g. treatment 1-3
> >       genotype 1-3
> >
> >The interaction is always of interest.  However, even in the absence of
> >interaction, the genotype or treatment effect is often of interest.  e.g.
> >the genotype may be various knockouts - we are interested in knowing what
> >other genes have been induced or repressed.  Why is the test for main
> >effects for a particular gene less interesting in this context than e.g.
> >the change in weight or other univariate measure?
> >
> >
> >--Naomi
> >
> >At 12:23 AM 12/23/2004 +1100, Gordon K Smyth wrote:
> >> > Date: Tue, 21 Dec 2004 17:21:19 -0500
> >> > From: Naomi Altman <naomi at stat.psu.edu>
> >> > Subject: [BioC] F-tests for factorial effects - limma
> >> > To: bioconductor at stat.math.ethz.ch
> >> >
> >> > I am analyzing a 2-factor factorial Affy experiment, with 3 d.f. for
> >> > each factor.
> >> >
> >> > I would like to get the F-tests for the main effects and interactions
> >>
> >> using
> >>
> >> > limma.
> >> >
> >> > I have computed all the contrasts, and got the t-tests (both
> >>
> >> unadjusted and
> >>
> >> > eBayes).  I do know how to combine these into F-tests "by hand" but I
> >>
> >> could
> >>
> >> > not figure out if there was a simple way to do this using limma.
> >>
> >>limma doesn't have any easy way to deal with main effects and
> >>interactions, at least not with main
> >>effects, interactions are actually simpler.  I haven't implemented this
> >>because I've never been
> >>able to figure out what one would do with these things in a microarray
> >>context.
> >>
> >>To compute F-tests for main effects and interaction, the easiest way
> >>would probably be to compute
> >>the SS for main effects and interactions by non-limma means, then use
> >>shrinkVar() to adjust the
> >>residual mean squares, i.e., the F-statistic denominators.
> >>
> >>If you only want F-tests for interactions, the following code would work:
> >>
> >>X <- model.matrix(~a*b)
> >>fit <- lmFit(eset, X)
> >>p <- ncol(X)
> >>cont.ia <- diag(p)[,attr(X,"assign")==3]
> >>fit.ia <- eBayes(contrasts.fit(fit, cont.ia))
> >>
> >>Now fit.ia contains the F-statistic and p-values for the interaction in
> >>fit.ia$F and
> >>fit.ia$F.p.value.
> >>
> >> > I had a look at FStat (classifyTestsF).  There seems to be a problem,
> >> > in that the matrix tstat is not premultiplied by the contrast matrix
> >> > when the F-statistics are computed.  So, if the contrasts are not
> >> > full-rank, an error is generated (instead of the F-statistics) because
> >> > nrow(Q) != ncol(tstat)..  (See the lines below).
> >>
> >>No, the code is correct.  FStat is quite happy with non full rank
> >>contrasts but the contrast
> >>matrix must be applied using contrasts.fit() before entering
> >>FStat().  You should not expect to
> >>see a contrast matrix inside the classifyTestsF() code.
> >>
> >>Gordon
> >>
> >> > if (fstat.only) {
> >> >          fstat <- drop((tstat%*% Q)^2 %*% array(1, c(r, 1)))
> >> >          attr(fstat, "df1") <- r
> >> >          attr(fstat, "df2") <- df
> >> >          return(fstat)
> >> >      }
> >> >
> >> > I figured that before I fiddled with the code, I would check to make
> >> > sure that I didn't miss an existing routine to do this.
> >> >
> >> > Thanks in advance.
> >> >
> >> > Naomi S. Altman                                814-865-3791 (voice)
> >> > Associate Professor
> >> > Bioinformatics Consulting Center
> >> > Dept. of Statistics                              814-863-7114 (fax)
> >> > Penn State University                         814-865-1348
> >> > (Statistics) University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)


Este correo electronico y, en su caso, cualquier fichero anexo al mismo, contiene informacion exclusivamente dirigida a su destinatario o destinatarios. Si Vd. ha recibido este mensaje por error, se ruega notificar esta circunstancia al remitente. Las ideas y opiniones manifestadas en este mensaje corresponden unicamente a su autor y no representan necesariamente a las del Centro Nacional de Investigaciones Oncologicas (CNIO).


The information contained in this message is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator. Please note that the Spanish National Cancer Centre (CNIO), does not accept liability for any statements or opinions made which are clearly the sender's own and not expressly made on behalf of the Centre.



More information about the Bioconductor mailing list