[BioC] prefiltering genelist before LIMMA & cut-off paramter for getting DEGs from topT object.

Fri Dec 6 20:18:05 CET 2013

Hi,

On Fri, Dec 6, 2013 at 12:16 AM, deb [guest] <guest at bioconductor.org> wrote:
>
> Dear all,
>
> I am grateful to you for the prompt reply to my query.
>
> 1) So I think adj.p.value with "BH" adjust.method (which also takes " FDR" issues into consideration) will be my choice for extracting significant DEGs.
> But I am looking into the propnulltrue function also to get more ideas.
>
> 2) I do however need some ideas on your suggestion regarding "If you want to give even more priority to larger fold changes, then we recommend that you use treat().  This is better than just cutting on estimated logFC
> value."
>
> Can you please enlighten me on the issue?

Please take the time to read through the help you land on when you
invoke `?treat` from your R session.

In the "Details" section you'll find:

"""
`treat` computes empirical Bayes moderated-t p-values relative to a
minimum required fold-change threshold. Use topTreat to summarize
output from treat. Instead of testing for genes which have
log-fold-changes different from zero, it tests whether the
log2-fold-change is greater than lfc in absolute value (McCarthy and
Smyth, 2009).
"""

Where the paper referenced is the following:

Testing significance relative to a fold-change threshold is a TREAT.
Bioinformatics.
http://bioinformatics.oxfordjournals.org/content/25/6/765.abstract

I suspect you will find more enlightenment than you bargained for there ;-)

> 3) I have one more question regarding the utility of genefilter() in
> pre-filtering normalised data based on mean expression intensity prior to
> LIMMA. I all advisable examples filtering has been done for minimum
> expression intensity as 3.5 and follow-up statistics is multtest,SAM or
> t-test with multiple adjustment. Do you recommend to do it before LIMMA?

If you read through the list archives, you will find different
opinions on this manner being put forth by people with the appropriate
"bona fides" to argue their point.

You might try a few different approaches (filter, don't filter, etc)
to see if it makes a substantial difference on your results -- check
the pvalue distribution you get when you filter/don't etc. to at least
give you some piece of mind that what you are doing is not insane.

In the end, you should be sure that you have sufficiently convinced
yourself that you are not torturing your data to make it say something
you wanted to hear because you will eventually be put in the position
where you will have to convince someone else of the same.

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Genentech