[BioC] meaning of logFC in regression and how to run a multiple regression in limma

Gordon K Smyth smyth at wehi.EDU.AU
Thu Mar 6 01:40:00 CET 2008



On Wed, 5 Mar 2008, Artur Veloso wrote:

> Dear Gordon and others,
>
> I have two questions regarding regressions on limma and would greatly
> appreciate some guidance:
>
> 1- This might be just my lack of statistics knowledge, but I can't wrap my
> head around what the logFC represents in a regression.  In a situation where
> there are categories (like an anova) I understand that the logFC would be
> the log of the difference in expression between the two groups, and that is
> similar to what the explanation found on the topTable help page.  But when
> there are no groups, what does the logFC represent? Or is it nonsense to
> even use logFC in such situation?

Whatever the linear you fit, the logFC is the estimated coefficient for 
that gene and that covariate.

> 2- Is it possible to run multiple regressions in limma?

You've already done so in your lmFit below.

> I was trying two
> different approaches to do this, but I couldn't find anything on the user
> guide that pointed me to do them so I'm afraid that what I was doing was
> wrong.  My approaches were either creating contrasts between the two
> continuous variables or using more than one column as the coefficient in a
> topTable function call.  Are any of these approaches acceptable or should I
> be doing this in a different way?

Contrasts usually have no meaning for continuous covariates.  I have no 
idea what you're trying to achieve by using makeContrasts().

I suggest that you consult a statistician at your own college about the 
meaning of multiple regression coefficients for your experiment.  limma is 
returning you ordinary multiple regression coefficients for each gene.  To 
make proper use of limma, you do need to have an understanding of the 
corresponding multiple regression problem in a univariate sense.

Best wishes
Gordon

> Thank you very much,
>
> Artur Veloso
> Master's in Marine Biology Candidate
> College of Charleston, SC, USA
>
>
> #trying to use contrasts
> all.design <- model.matrix(~log(cad.concentrations)+log(cu.concentrations
> )+log(zn.concentrations))
> colnames(all.design) <- c("Intercept","cad","cu","zn") #this is necessary
> because the paranthesis on "intercept" cause the function to give an error
> message
> all.correlation <- duplicateCorrelation(vsn.normalized,all.design
> ,ndups=2,spacing=1)
> all.regression <- lmFit(vsn.normalized,all.design
> ,ndups=2,spacing=1,correlation=all.correlation$consensus)
> all.contrast <- makeContrasts(cad+cu+zn,levels=all.design)
> topTable(eBayes(contrasts.fit(all.regression,all.contrast)))
>
>
> #using more than one column in the coefficients for topTable
> all.design <- model.matrix(~log(cad.concentrations)+log(cu.concentrations
> )+log(zn.concentrations))
> all.correlation <- duplicateCorrelation(vsn.normalized,all.design
> ,ndups=2,spacing=1)
> all.regression <- lmFit(vsn.normalized,all.design
> ,ndups=2,spacing=1,correlation=all.correlation$consensus)
> topTable(eBayes(all.regression),coef=c(2,3,4))



More information about the Bioconductor mailing list