[BioC] goodness-of-fit for limma
binye at med.wayne.edu
Wed Jun 8 15:47:20 CEST 2005
Thank you very much! I'll check those out. BTW, for our dataset, it's kinda 3x3 factorial design , where we have three different treatment and three time points, but because of the nature of the cell line that we are using, we have controls (untreated cells at the same time points) for all the 3x3= 9 different combinations. So far, I'm doing my analysis by first taking the ratio of each of the 9 data points, then using limma for comparing the differential expression. Example as follows: (c1 to c9 is the corresponding controls).
time1 time2 time3
drug1 d1t1/c1 d1t2/c2 d1t3/c3
drug2 d2t1/c4 d2t2/c5 d2t3/c6
drug3 d3t1/c7 d3t2/c8 d3t3/c9
Then comparing all the possible pairs. I'm not a statistician, so I'm not sure if this is a valid way to do the analysis or I should add the dependent variable "controls" to the model? I'm trying to add it to the model, but so far it's still cloudy in my mind on how to make the matrixes. All suggestions will be appreciated.
BTW, for finding the specific gene profiles that follow the time serie. Which package will be more approprate? Thanks!
From: fhong at salk.edu [mailto:fhong at salk.edu]
Sent: Tue 6/7/2005 2:54 PM
To: Ye, Bin
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] goodness-of-fit for limma
> I'm doing some affy microarray analysis using limma, and I'm not a
> statistician. I was told that I need to check if the model fits the data
> before get the significant gene lists. So how should I do it in limma? And
> is it really necessary? If not, why?
>From a statistical viewpoint, only when the model ( for example, linear
model used in limma) is a good analogy of the true data generating
mechanism, the results ( like differential genes found) are valid. The
common check is residual plots, to see whether the residual ( difference
between the true value and fitted value) satisfy the assumption. You would
extract residuals from lmFit.
However, I don't see many people doing this when identifying genes. If you
worry about that linear model might not explain the data well, you can go
to some other non-parametric methods, like RankProd and siggenes.
Hopu this will help
> Thanks a lot!
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
Fangxin Hong Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu
(Phone): 858-453-4100 ext 1105
More information about the Bioconductor