[BioC] extremely low adjusted p-values in regression using limma

Gordon K Smyth smyth at wehi.EDU.AU
Sun Mar 2 05:02:39 CET 2008


Dear Artur,

> Message: 14
> Date: Fri, 29 Feb 2008 14:53:40 -0500
> From: "Artur Veloso" <abveloso at gmail.com>
> Subject: [BioC] extremely low adjusted p-values in regression using
> 	limma
> To: "Bioconductor List" <bioconductor at stat.math.ethz.ch>
>
> Dear all,
>
> I am trying to run a regression analysis on my microarray data using the
> limma and all the genes are showing adjusted p-values lower than 10 ^ - 20,
> which points me to some sort of major mistake on the code that I am writing
> for this analysis.  I searched the limma user's guide and the Bioconductor
> mail archive for explanations on how to run regressions on limma but could
> not figure out what I was doing wrong from them.

You are getting very small p-values because you are testing the intercept 
equal to zero, which obviously it isn't.  All the relevant examples in the 
User's Guide advise you to use topTable(fit,coef=2), which would solve 
your problem.

> The code that I am using
> goes below and any help with it would be extremely appreciated.
> Also, is it possible to get the r-squared values for each gene in the
> analysis?

R-squared is of no interest in ANOVA situations.  Even in regression with 
normally distributed covariates it is of limited interest, because it is 
just a transformation of the F-statistic.  You could express the moderated 
F-statistic as an Rsquared proportion by:

   fit2 <- eBayes(fit[,-1])  # remove the intercept
   df.total <- fit2$df.residual+fit2$df.prior
   Ratio <- fit2$F*fit$rank/df.total
   Rsquared <- Ratio/(1+Ratio)

Best wishes
Gordon

> Thank you very much,
>
> Artur Veloso
> Masters in Marine Biology Candidate
> College of Charleston, SC, USA
>
>> concentrations
> [1]  9.020  1.740  0.960  9.740  2.460  1.810  1.478  1.380  2.760  0.860
> [11]  2.520  3.950  5.210  1.090 26.760  1.670  0.860  1.340  7.500  5.380
>> regression.design <- model.matrix(~log(concentrations))
>> regression.design
>   (Intercept) log(concentrations)
> 1            1          2.19944433
> 2            1          0.55388511
> 3            1         -0.04082199
> 4            1          2.27624112
> 5            1          0.90016135
> 6            1          0.59332685
> 7            1          0.39068982
> 8            1          0.32208350
> 9            1          1.01523068
> 10           1         -0.15082289
> 11           1          0.92425890
> 12           1          1.37371558
> 13           1          1.65057986
> 14           1          0.08617770
> 15           1          3.28690824
> 16           1          0.51282363
> 17           1         -0.15082289
> 18           1          0.29266961
> 19           1          2.01490302
> 20           1          1.68268837
> attr(,"assign")
> [1] 0 1
>> dim(vsn.normalized)
> [1] 12426    20
>> data.correlation <- duplicateCorrelation(vsn.normalized,regression.design
> ,ndups=2,spacing=1)
>> data.regression <- lmFit(vsn.normalized,regression.design
> ,ndups=2,spacing=1,correlation=data.correlation$consensus)
>> tail(topTable(eBayes(data.regression),number=6200))
>                                                                 ID
> 1808 CV133193 triacylglycerol lipase activity CV_HP_010_Plate_96_B8
> 5545                    CD648266 none CV_UNI_001_Plate_96_G6_bottom
> 5267     CV088280 integral to membrane CV_HP_009_Plate_96_D9_bottom
> 4055                    CV133023 none CV_HP_011_Plate_96_D11_bottom
> 5935                     CV132240 none CV_HP_004_Plate_96_C6_bottom
> 4392                    AJ565488 none CG_NS1_001_Plate_96_G5_bottom
>     X.Intercept. log.concentrations.        F      P.Value
> 1808     11.87372          -0.0947231 250.6960 2.810474e-24
> 5545     13.32032           0.1710411 244.8938 4.447522e-24
> 5267     11.13640          -0.1087474 242.2680 5.492360e-24
> 4055     11.03485          -0.1456426 239.7445 6.740600e-24
> 5935     10.69073           0.9076729 235.8741 9.263810e-24
> 4392     11.59845          -0.7217251 234.0277 1.079942e-23
>        adj.P.Val
> 1808 2.818640e-24
> 5545 4.459725e-24
> 5267 5.506541e-24
> 4055 6.756913e-24
> 5935 9.284731e-24
> 4392 1.082206e-23
>
>> sessionInfo()
> R version 2.6.1 (2007-11-26)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] statmod_1.3.3        vsn_3.2.1            affy_1.16.0
> preprocessCore_1.0.0 affyio_1.6.1
> [6] Biobase_1.16.2       limma_2.12.0
>
> loaded via a namespace (and not attached):
> [1] grid_2.6.1     lattice_0.17-2



More information about the Bioconductor mailing list