[BioC] Interpretation of fold.changes for covariates in a linear model using voom

Francesco Gatto [guest] guest at bioconductor.org
Mon Apr 14 15:22:45 CEST 2014

I have implemented a linear model to fit RNAseq read counts for ~1000 samples using ~200 covariates plus the intercept. The pipeline follows the example in the user manual, i.e. v=voom(y,design) -> fit=lmFit(v,design) -> fit2=eBayes(fit). Then I am using decideTests() to assess the effect of each covariate on gene expression. However, the output is less complete than topTable(), so I used this function to obtain values on fold-changes and individual p-values. At this stage, values become difficult to interpret. If the LM is:

gene_i,j = intercept_i + alfa_i*Factor1_j + beta_i*Factor2_j + (~200 other factors)

and, to simplify, let's say covariates are binary. Then what do the fold-changes for topTable(fit2,coef="alfa") mean?
My interpretation was the up/down-regulation due to the presence of covariate alfa, but if I boxplot or summarize the expression values for the two groups (alfa = 1 vs alfa = 0) I get quite contradictory results. For example this is the most DE gene when the sample has a mutation in KRAS (the covariate, 0 if wild-type, 1 if mutated):

> topTable(fit2,coef="metadataKRAS",number=1)

hgnc_symbol     logFC    AveExpr         t      P.Value    adj.P.Val        B
 GOLGA6A -1.314213 -5.9000494 -7.375175 3.800569e-13 7.204358e-09 18.80961

The summary of gene expression values for wild-type KRAS vs. mutated is:

> by(E["GOLGA6A",],design[,"metadataKRAS"],summary)

design[, "metadataKRAS"]: 0
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -7.882  -6.865  -6.461  -6.021  -5.770   1.548 
design[, "metadataKRAS"]: 1
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -7.657  -6.247  -5.487  -5.239  -4.582  -1.311 

This is an up-regulation, and of magnitude way lower than the one indicated by topTable. Is this because I am misusing the function topTable? I cannot wrap my head around this. Thanks for your help!

 -- output of sessionInfo(): 

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

[1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.4.2  limma_3.18.9

loaded via a namespace (and not attached):
[1] tools_3.0.2

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list