[BioC] how to test for genes of interest?

Thu Jul 24 20:13:06 CEST 2008

Hi Glyn, Jenny

I think also that it is a valid point of view to use test p-values to
rank the genes by priority, without believing the p-values literally,
but to still use them, as well as further information (e.g., other data,
known controls, gene set enrichment analysis or its various
commercializations such as Ingenuity) to decide where to cut off.

Probabilistic computations, such as FDR, can be highly instructive and
useful; but one need not get carried away in confusing the probability
model that resulted in the p-values you're looking at with an actual
ensemble of experiments. (By doing so, one might in fact do a disservice
to biologists who often view the microarray as a discovery tool in a
context, not as a standalone confirmation.)

Also note that care is needed if the criterion that selects your gene
subset is data-driven. There are a number of papers about this (and I
guess there will be more), but the bottomline is that -if you do care
about the multiple testing aspect- you're not "cheating" in the multiple
testing correction as long as the criterion is unaware of the contrast
of interest that is subsequently tested.

Best wishes
 Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

24/07/2008 16:48 Glyn Bradley scripsit
> Hi Jenny
> I may get shot down horribly for saying this on this list, but isn't
> there a large school of thought which says don't do FDR at all, just
> take the large list of genes out and mine it for biological
> significance.
> Certainly a large pharma I've a little experienmce of takes that
> approach. Stats are just stats afterall. (and I'm sure you're going to
> validate the results with some other wet lab technique anyway).
> 
> 
> Glyn PhD
> Bioinf and systems modelling
> mycib.ac.uk
> 
> On Thu, Jul 24, 2008 at 4:14 PM, Jenny Drnevich <drnevich at illinois.edu> wrote:
>> Hi everyone,
>>
>> I've always heard that one of the ways "around" the multiple testing problem
>> of microarrays is for you to a priori identify a particular list of genes
>> you're interested in, and then you only have to do the multiple test
>> correction for this smaller list. I've never done this in practice, and I'm
>> not sure at what point in the analysis it's proper to pull out just the
>> smaller list. Obviously, all the data preprocessing and normalization will
>> be done with all the genes, but should I pull out the genes before fitting
>> the model, or after fitting the model right before the multiple test
>> adjustment? I'm using the eBayes() shrinkage in limma, so which genes are in
>> the model will make a big difference in the outcome.
>>
>> I'm thinking it would be best to keep all the genes in the model, and then
>> split them out into two groups (genes of interest and all the rest) and do a
>> FDR correction separately for each group. What do you think?
>>
>> Thanks,
>> Jenny
>>
>> Jenny Drnevich, Ph.D.
>>
>> Functional Genomics Bioinformatics Specialist
>> W.M. Keck Center for Comparative and Functional Genomics
>> Roy J. Carver Biotechnology Center
>> University of Illinois, Urbana-Champaign
>>
>> 330 ERML
>> 1201 W. Gregory Dr.
>> Urbana, IL 61801
>> USA
>>
>> ph: 217-244-7355
>> fax: 217-265-5066
>> e-mail: drnevich at illinois.edu