[BioC] Unexpected results using limma with numerical factor - regression intercept?

Matthew Hannah Hannah at mpimp-golm.mpg.de
Thu Aug 26 15:11:37 CEST 2004


Sorry for getting things confused, I'd like to ask a statistian but we
don't really have one:(
But hopefully I'm improving...

My mistakes were -
1.I mixed up lmFit(limma) with lm.fit(stats) - sorry, 1 character's not
alot when you are in a hurry;)

2.Being unsure of how lmFit handles factor c(1,2,3..) vs numeric
c(1,2,3..) and thinking this example in the Limma user guide meant -ve
numbers could be confused as dye swaps.
> design <- c(-1,1,-1,1) #"The negative numbers in the design matrix
indicate the dye-swaps."
> fit <- lmFit(MA,design) 
sorry, no real excuse there, it just seemed logical when I typed
it...doh

Anyway after much searching I *think* I've found out how to use limma
for (standard?) regression and might even be able to solve the original
posters query....

lm() and (I guess) lmFit have an implied intercept term, so the design
of the original posters fit
>design <- model.matrix(~-1 + TestScore1, pData(eset));
removes the intercept and so forces the fit to go through zero?? For
linear regression against a quantified variable you need to allow an
intercept to be fitted (unless you have a reason to think it goes
through zero?)??

Maybe I'm wrong but here's my logic -
num <- c(1,2,3,4,5,6,7) #measured growth...etc
#Pearson and its p-value
cor(exprs(esetgcrma)[1,]~num)
cor.test(exprs(esetgcrma)[1,]~num)

#lm - produces pearson squared, and same p-value as above, also produces
coefficients
lm(exprs(esetgcrma)[1,]~num)

#limma - produces the same coefficients as lm() above
design <- model.matrix(~num)
fit <- lmFit(esetgcrma,design)

So limma produces the 'same' results as pearson and lm(), but you can
carry them on for further analysis.....

So is this correct, or should I go back to wearing the dunce hat in the
corner?

Cheers,
Matt

> -----Original Message-----
> From: Gordon Smyth [mailto:smyth at wehi.edu.au] 
> Sent: Donnerstag, 26. August 2004 09:38
> To: Matthew Hannah
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Unexpected results using limma with 
> numerical factor
> 
> At 05:40 PM 25/08/2004, Matthew  Hannah wrote:
> >I've recently asked a similar question but have got no feedback, see 
> >point 2 onwards in the following.
> >https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-Aug
ust/005802.
> >html
> >
> >As I now understand it (perhaps/probably wrongly) the 
> standard approach 
> >(equilivent to ANOVA?) is to use a non-numeric factor for the fit.
> >However, lm() in R is also capable of regression fits. 
> Looking (but not
> >understanding) lm.fit in limma, it appears to be an independent 
> >function (lm bit written in fortran?) and doesn't call lm() 
> from stats. 
> >So the question is really if lm.fit can do numeric regression?
> 
> I am afraid that you've got things a bit skew. lm.fit() is 
> not a function in limma, rather is a low level function in 
> the stats package which is called by lm(). The limma 
> functions use lm.fit() and lm.wfit() just like
> lm() does.
> 
> It isn't really helpful to think of ANOVA in the microarray 
> context. ANOVA is a special case of regression.
> 
> >Another consideration is that you wouldn't use a design 
> (factor) but a 
> >numeric vector. My guess is that your design is being taken as a 
> >factor, (and if you look in the user guide) the -ve values 
> may indicate 
> >dye swaps, which could be interesting! I've just tried lmfit with a 
> >numeric vector (but as there are replicates each number appears 3 
> >times) and got meaningless results - all lods>30 and all 
> tiny p-values 
> >1e-24. So initially it looks like you can't use limma like this, but 
> >I'd like to hear an expert verdict as if the approach is 
> completely wrong.
> 
> I don't think I understand your question. Have a talk with a 
> statistician at your own institution about factors, 
> covariates and design matrices.
> 
> Gordon
> 
> >Anyway if you're looking for correlations then you could perhaps try 
> >pearson (see my previous post about p-values and R2 from lm().
> >try-
> >
> >Test1 <- c(0.58,-2.36,-12.24,-14.84,0.15,-3.23,-11.66,-12.91)
> >Correl <- esApply(eset, 1, function(x) { cor(x, Test1)
> >})
> >
> >be aware that you might get alot of correlations by chance, 
> >particularly as your scores seem to be in 2 groups, close to 
> zero and < 
> >-11. And a straight line between two groups gives a good pearson as 
> >it's sensitive to outliers.
> >
> >Perhaps it's best to change your test scores to factors anyway- 
> >Test1.high, Test1.low, Test2.high, Test2.low, and do a conventional 
> >limma analysis. Without a regular distribution of test scores, 
> >correlations are going to be largely meaningless.
> >
> >HTH, and that someone can answer this more definetely.
> >
> >Cheers,
> >Matt
> 
> 
>



More information about the Bioconductor mailing list