[BioC] necessity of moderated t statistic and false discoveries for small predefined gene list?

Richard Friedman friedman at cancercenter.columbia.edu
Thu May 17 15:25:14 CEST 2012


Moshe,

	Thank you for the clarification on the moderated t-statistic.
If I am only interested in 10 genes is it better to calculate the  
moderated
statistic and hence raw p-values based on all of the genes on the array
or just thoe 10 genes?

Best wishes,
Rich

On May 17, 2012, at 12:35 AM, Moshe Olshansky wrote:

> Hi Rich,
>
> I think that Gordon Smyth (the author of limma) has explained at  
> this list
> what moderated t-statistic is.
> The brief explanation is that when there are few samples the  
> estimate of
> the variance which is used in a standard t-test is quite noisy and  
> because
> one must account for this noise the standard t-test has a low  
> statistical
> power. The Empirical Bayes model used in the moderated t-tests  
> allows to
> estimate the variance with more confidence and therefore has a better
> power. So it can be used even if you are interested in just a few  
> genes.
> It has (almost) nothing to do with the multiple testing adjustment.  
> Well,
> one may ask whether moderated p-values satisfy the assumptions of  
> multiple
> testing adjustment procedures (in particular the BH), but this is  
> another
> story. May be Gordon will comment on this.
>
> Best regards,
> Moshe.
>
>> Moshe and List,
>>
>> 	Thanks for yoru reply. The method you describe retains
>> the raw p-value based on the moderated t-statistic and adjusts
>> it to give an adjusted p-value (usually a false discovery rate).
>> However, as I understand it, the moderated
>> t-statistic given by Limma based on
>> all of the genes in the array, pools variance information
>> to moderate the standard deviation to prevent fortuitously
>> low p-values stemming from fortuitously low standard deviations
>> encountered in thousands of multiple tests.I am wondering
>> that if the experimentalist asks me to look up just 10 genes
>> I should use the unmoderated frequentist t-statistic which
>> will differ from the one in Limma and may imply significance
>> where Limma does not. I guess another way to phrase it is
>> "How many simulataneous tests does one need before one
>> should prefer the moderated statistic to the empirical
>> Bayesian one". Or should I fit just those 10 genes
>> (~30 affy probes) with Limma?
>>
>> Best wishes,
>> Rich
>>
>>
>>
>> On Thu, 17 May 2012, Moshe Olshansky wrote:
>>
>>> Hi Rich,
>>>
>>> Whether to use the moderated t-statistic or not does not depend on
>>> whether
>>> you are interested in the 10 particular genes or in all  
>>> differentially
>>> expressed ones. This will affect your multiple testing adjustment.
>>> The simplest way for you to proceed is to use limma as usual, get  
>>> the
>>> topTable but then take the UNADJUSTED p-values for your 10 genes of
>>> interest and use the p.adjust function to adjust for multiple  
>>> testing if
>>> you wish. In any case you should also look at (log)Fold Changes.
>>>
>>> Best regards,
>>> Moshe.
>>>
>>>
>>>> Dear Bioconductor  List.
>>>>
>>>> 	I am using Limma to analyze differential expression between 2
>>>> conditions on an Affy chip.
>>>> My experimental collaborator asks for the differential   
>>>> expression of
>>>> 10 predefined genes.
>>>>
>>>> A, Should I correct for false discoveries based upon all of the  
>>>> genes
>>>> on the chip?
>>>> B. If not, should I correct for false discoveries just for the
>>>> probeids for the 10 predefined
>>>> genes?
>>>> C. Should I use the moderated t-statistic or just use an  
>>>> unmoderated t-
>>>> test for those 10
>>>> genes.
>>>>
>>>> Thanks and best wishes,
>>>> Rich
>>>> ------------------------------------------------------------
>>>> Richard A. Friedman, PhD
>>>> Associate Research Scientist,
>>>> Biomedical Informatics Shared Resource
>>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>>> Lecturer,
>>>> Department of Biomedical Informatics (DBMI)
>>>> Educational Coordinator,
>>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>>> National Center for Multiscale Analysis of Genomic Networks  
>>>> (MAGNet)
>>>> Room 824
>>>> Irving Cancer Research Center
>>>> Columbia University
>>>> 1130 St. Nicholas Ave
>>>> New York, NY 10032
>>>> (212)851-4765 (voice)
>>>> friedman at cancercenter.columbia.edu
>>>> http://cancercenter.columbia.edu/~friedman/
>>>>
>>>> "School is an evil plot to suppress my individuality"
>>>>
>>>> Rose Friedman, age15
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>>
>>
>> --
>> ------------------------------------------------------------
>> Richard A. Friedman, PhD
>> Associate Research Scientist
>> Herbert Irving Comprehensive Cancer Center
>> Biomedical Informatics Shared Resource
>> Lecturer
>> Department of Biomedical Informatics
>> Box 95, Room 130BB or P&S 1-420C
>> Columbia University Medical Center
>> 630 W. 168th St.
>> New York, NY 10032
>> (212)305-6901 (5-6901) (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> "The last 250 pages of the last Harry Potter
>> book took place in one day because alot
>> happened in that day. All of Ulysses takes
>> place in one day and nothing happened in that day."
>> -Rose Friedman, age 11
>>
>>
>
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}



More information about the Bioconductor mailing list