[BioC] Limma nestedF

Wed Oct 11 17:03:45 CEST 2006

Hi Noelle,

First, please don't take things off-list. The list archives are a good 
resource for people to search, so taking questions/answers off-list 
reduces the information content.

noel0925 at sbcglobal.net wrote:
> Hi Jim,
> 
> Thank you very much for your response. I am only a novice at these
> types of analyses and am trying to get a handle on it all.
> 
> So in this particular experiment there are six gentoypes (a wild-type
> and 5 different mutants due to point mutations in the same gene). One
> of the goals of this experiment is to be able to rank the mutants
> according to their severity or the difference in their expression
> profile compared to the wild-type in treatment 1 and treatment 2. It
> is also of interest to see how the various mutants respond to the two
> treatments, and  finally it might also be interesting to consider the
> interaction term (genotype*treatment).  Thus, in this study the two
> main effects are of interest, as are the interaction effects.

Well, you don't have any classical main effects here. If you study 
section 8.7 in the Limma User's Guide, you will see what I mean. In fact 
the comparisons you show below are exactly what Gordon argues one should 
be making. Since there are no classical main effects (i.e., treatment 
without regard to WT/MT status, and WT vs MT without regard to 
treatment), then you don't really have to worry about interactions 
superceding main effects.

> 
> Here is my contrast matrix:
> 
> cont.matrix<- makeContrasts( WT.Treat.Effect=WT.T2 - WT.T1,
> 
> Mt1.Geno.Effect.T1=Mt1.T1 - WT.T1, Mt2.Geno.Effect.T1=Mt2.T1 - WT.T1,
>  Mt3.Geno.Effect.T1=Mt3.T1 - WT.T1, Mt4.Geno.Effect.T1=Mt4.T1 -
> WT.T1, Mt5.Geno.Effect.T1=Mt5.T1 - WT.T1,
> 
> Mt1.Geno.Effect.T2=Mt1.T2 - WT.T2, Mt2.Geno.Effect.T2=Mt2.T2 - WT.T2,
>  Mt3.Geno.Effect.T2=Mt3.T2 - WT.T2, Mt4.Geno.Effect.T2=Mt4.T2 -
> WT.T2, Mt5.Geno.Effect.T2=Mt5.T2 - WT.T2,
> 
> Mt1.Treat.Effect=Mt1.T2 - Mt1.T1, Mt2.Treat.Effect=Mt2.T2 - Mt2.T1, 
> Mt3.Treat.Effect=Mt3.T2 - Mt3.T1, Mt4.Treat.Effect=Mt4.T2 - Mt4.T1, 
> Mt5.Treat.Effect=Mt5.T2 - Mt5.T1,
> 
> Mt1.Int.Effect=(Mt1.T2 - Mt1.T1) - (WT.T2 - WT.T1), 
> Mt2.Int.Effect=(Mt2.T2 - Mt2.T1) - (WT.T2 - WT.T1), 
> Mt3.Int.Effect=(Mt3.T2 - Mt3.T1) - (WT.T2 - WT.T1), 
> Mt4.Int.Effect=(Mt4.T2 - Mt4.T1) - (WT.T2 - WT.T1), 
> Mt5.Int.Effect=(Mt5.T2 - Mt5.T1) - (WT.T2 - WT.T1), levels = design)
> 
> Does the fact that main effects are of interest in any way alter how
> you would tackle this problem? I am eager to hear your opinion!

This experiment reminds me of a certain type of client that I see. When 
we talk about their experiment and ask if a certain comparison is of 
interest, they nod their head vigorously and say 'yes!'. Unfortunately, 
they say the same thing regardless of the comparison, so I end up making 
every possible comparison I could think of, and I send the results off 
wondering how they are ever going to get anything useful from the 
experiment (not because the comps are bad per se, but because of the 
deluge of data I just buried them with).

This is the downside IMO of microarray experiments. They tend to be very 
expensive, so people often want to wring every last bit of information 
possible out of a given experiment, regardless of how interesting a bit 
of information may be.

I suppose one could analyze the data once, output all the significant 
terms, and then look at them at their leisure, but I would prefer a more 
hypothesis/goal driven approach such as your first statement. If you 
want to rank the mutants according to their severity, how would one do 
that? Is severity a genotypic or phenotypic quantity? If genotypic, is 
it measured by the number of differentially expressed genes when 
compared to normal, or can you rank based on the particular genes that 
are differentially expressed? Thinking about what you really want from 
the data and how you will measure that quantity IMO is a better way to 
go. Anyway, enough ranting ;-D

> 
> As regards handling interaction terms, I am aware of the concept of
> testing for interaction terms and dropping them if they are not
> significant, but how is this done in Limma or for microarray analyses
> in general- is there a package in BioC that does this? Also, if as
> you say the interaction terms are usually the only "interesting"
> contrasts in most microaray experiments like the knockout example you
> gave- then what do analysts usually do to handle this? It seems the
> examples I read in the literature never mention testing for the
> significance of interaction terms. Is there a refrerence for this you
> can point me to? I am eager to learn.

If you were to do something like this (as mentioned, it's not 
necessary), you would have to do it by hand. I don't think there is any 
functionality in limma for this, probably because a classical main 
effect is usually not sensical in the context of a microarray analysis.

> 
> I for example, looked at the Weaver mutant data a 2X2 factorial
> experiment for 2 color data found on page 75 of the Limma Users
> Manual. While my data is single channel, it is likewise a factorial
> design (6 X 2) and hence similar from that standpoint as well as that
> both studies consider main effects and interaction terms.

The factorial design in section 8.7 is probably a better example to look at.

> 
> Also, you suggested using a stepwise approach and testing each
> particular hypothesis separately- do you mean within Limma using
> decideTests with the method "separate" or does this entail something
> else? I am still uncertain how to implement this.
> 
> Finally, if it is not too much to ask how does "separate" differ from
> "global"? The Limma manual says that global will treat the entire
> matrix of t-statistics as a single vector of unrelated tests- I
> assume this means independent tests. It seems that "separate" does
> the same since it treats each coefficient separately- obviously they
> are not the same, but I have missed out on recognizing the
> difference.

The difference lies in the way multiplicity is handled. If you use the 
'global' option, then you will have 21 x number of genes p-values that 
will then be adjusted for multiple comparisons. If you use the 
'separate' option, then you will adjust for multiplicity for each 
contrast as if the other contrasts were not made, which is more powerful 
but may lead to more false positives.

Best,

Jim

> 
> 
> Thank you again, Noelle
> 
> 
> 
> 
> 
> 
> "James W. MacDonald" <jmacdon at med.umich.edu> wrote: Hi Noelle,
> 
> noel0925 at sbcglobal.net wrote:
> 
>> Hi All,
>> 
>> I am wondering if someone can explain when it is appropriate to use
>>  the nestedF method of the decideTests function in Limma?
>> 
>> 
>>> From the manual I am aware that this adjusts down genes then
>>> across contrasts and that this strategy is recommended for
>>> complex experiments with many contrasts (like mine) since it may
>>> be "desirable to select genes firstly on the basis of their
>>> moderated F-statistic, and subsequently decide which of the
>>> individual contrasts are significant for the selected gene.
>> 
>> I am interested in identifying genes that are differentially 
>> expressed in many contrasts- in particular across genotypes that
>> are fairly similar, across treatments, and in various
>> genotype:treatment combinations (ie interaction effects). I expect
>> that the genotype effect will alter the same genes for some of the
>> samples. I also expect that the treatment effect will alter some of
>> the same genes across all the genotypes.
>> 
>> Is this an appropriate situation in which to use a nestedF test (I 
>> will also correct for multiple testing using "BH")? Is it correct
>> to form a contrast matrix for all my contrasts of interest
>> (including interaction effects) and test them all simultaneously
>> using decideTests(x, method="nestedF", method.adjust="BH", p=0.05)?
>> 
> 
> 
> I am not sure the nestedF method is appropriate for this situation, 
> because you have interaction terms. When there is an interaction term
> in the model, the usual thing to do is to check for significance of
> the interaction term, and if it isn't significant, then you would
> drop it from the model and check for significance of the main effects
> terms. The nestedF method won't do this - it will treat all the
> contrasts as if equal.
> 
> Another issue that arises with microarray analyses is the usefulness
> of main effects in general. As Gordon mentions in the Limma User's
> Guide, the main effects are usually not that interesting for
> microarray data, and are often nonsensical.
> 
> As an example, say you have wild type and knockout mice that you
> treated with a control vehicle and a drug. Usually one would be
> interested in finding those genes that are affected differently in
> the KO and WT mice when treated with the drug (i.e., the
> interaction). It is usually not interesting to ask which genes are
> differentially expressed in treated vs untreated mice regardless of
> KO/WT status (treatment main effect), or which genes are
> differentially expressed in WT vs KO mice regardless of treatment
> status (mouse main effect).
> 
> Using the nestedF approach with various interactions and main effects
>  looks (to me) like a 'shotgun' approach to the analysis. I think you
>  would be much better served to approach the analysis in a stepwise 
> manner, testing each particular hypothesis separately.
> 
> Best,
> 
> Jim
> 
> 
> 
> 
>> I have done so and as can be expected this returned a far larger 
>> number of DE genes (compared to decideTests(method = "separate", 
>> method.adjust="BH", p=0.05). In fact, the number of genes called 
>> significant by this approach for some of my contrasts is quite alot
>> ~ 5,000/24,000  (only ~ 100 for others) and perahps more than I
>> perhaps want use in for example other analyses such as GeneSet
>> enrichment or GO analyses, or heat maps.
>> 
>> If I wish to only consider a smaller number of genes, is it more 
>> correct in a statistical sense to use a more stringent p-value
>> cutoff after performing decideTests(x, method= "nestedF", 
>> method.adjust="BH", p=0.0001) or to consider the contrasts
>> separately and use a larger p-value cutoff? Clearly, some of these
>> tests are not independent, so I am inclined to go with the nestedF
>> approach and a more stingent cutoff.
>> 
>> I have read through other postings, and found this one especially 
>> helpful: 
>> https://stat.ethz.ch/pipermail/bioconductor/2006-March/012182.html
>>  but am still uncertain about my approach. Any comments would be 
>> appreciated.
>> 
>> Thanks in advance, Noelle
>> 
>> [[alternative HTML version deleted]]
>> 
>> _______________________________________________ Bioconductor
>> mailing list Bioconductor at stat.math.ethz.ch 
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the 
>> archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.