[BioC] p-values from robust linear model in LIMMA

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jun 3 02:10:05 CEST 2014

Dear Arvid,

You are right that the theory of hypothesis testing for robust regression
is problematic.  What limma does is to simply take the scale estimates and
robustifying weights from rlm() and input them into the usual limma
pipeline.  This is intuitively reasonable because it downweights values
that have been identified as potential outliers, and I wanted to include
rlm as an option because of some early microarray studies that used it:

http://www.pnas.org/content/100/9/5491.short

However the approximation to a t-distribution is fairly rough and I can't
give you good guidelines as to how reliable the p-values are.  The idea is
that you can still interpret the logFC, or use the B-statistics as a gene
ranking, even if you don't trust the p-values.

An alternative is to go back to lmFit() with method="ls" but to use
robust=TRUE (and perhaps trend=TRUE as well) at the eBayes() step instead.
This approach does have a rigorous theoretical underpinning:

http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf

In practice, my lab doesn't use method="robust" for our own analyses.
When we want to robustify, we do it at the eBayes step instead.

Best wishes
Gordon

> Date: Mon, 2 Jun 2014 07:23:04 +0000
> From: Arvid Sond?n <arvid.sonden at gu.se>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] p-values from robust linear model in LIMMA
>
> Dear all,
>
> I am currently working with expression data in LIMMA and have been asked
> to fit a robust linear model:
>
> lm.fit.rob <- lmFit(object=y\$E, design=m.matrix, method="robust")

You could simply use lmFit(y, design=m.matrix, etc).  No need for y\$E.

> lm.fit.rob.bayes <- eBayes(lm.fit.rob)
> lm.fit.rob.bayes.tt <- topTable(lm.fit.rob.bayes, coef="Group")
>
> It is easy to find that lmFit uses mrlm, and that mrlm uses rlm.
> However, rlm does not generate p-values, and from what I have read (e.g.
> http://r.789695.n4.nabble.com/p-values-td803236.html) it is not a
> trivial thing to do. What confuses me is that topTable generates
> p-values, and I can't find in the documentation how and on which
> assumptions.
>
> The robust models do greatly improve my p-values, but can they be
> trusted?
>
> Best regards,
>
> Arvid

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}