[BioC] goodness-of-fit for limma
fhong at salk.edu
fhong at salk.edu
Wed Jun 8 18:55:54 CEST 2005
> Dear Fangxin,
> Thank you very much! I'll check those out. BTW, for our dataset, it's
> kinda 3x3 factorial design , where we have three different treatment and
> three time points, but because of the nature of the cell line that we are
> using, we have controls (untreated cells at the same time points) for all
> the 3x3= 9 different combinations. So far, I'm doing my analysis by first
> taking the ratio of each of the 9 data points, then using limma for
> comparing the differential expression. Example as follows: (c1 to c9 is
> the corresponding controls).
> time1 time2 time3
> drug1 d1t1/c1 d1t2/c2 d1t3/c3
> drug2 d2t1/c4 d2t2/c5 d2t3/c6
> drug3 d3t1/c7 d3t2/c8 d3t3/c9
> Then comparing all the possible pairs. I'm not a statistician, so I'm not
> sure if this is a valid way to do the analysis or I should add the
> dependent variable "controls" to the model? I'm trying to add it to the
> model, but so far it's still cloudy in my mind on how to make the
> matrixes. All suggestions will be appreciated.
First, I think you should have a clear idea about what questions you want
to answer by analyzing the data. All possible pairwise comparison sounds
questionable, for example, d1t1/c1 vs d2t2/c5?
And depending on your sampling procedure, e.g. 9 controls are indenpdent
of each other ( I assume the your samples at 3*3 table are indenpendent),
I don't see why you need to include variable "controls" in your model. It
will then be a typical 3*3 factorial design.
> BTW, for finding the specific gene profiles that follow the time serie.
> Which package will be more approprate? Thanks!
Do you mean finding different pattern (time-dependent profile) of gene
expression? I don't know any good package of doing that, please let me
know if there is any. But for three time points, the pattern will be
really simply since no smooth curve can be assumed for gene expression
pattern. There are limited number of pattern ( e.g., if you order gene
expression atr three time point by its value as 1 (smallest), 2(medium)
and 3 (largest)), you may have
time1 time2 time3
1 2 3
2 1 3
If this is what you want, go to find a paper on Bioinformatics ( I forgot
Hope this helps
> -----Original Message-----
> From: fhong at salk.edu [mailto:fhong at salk.edu]
> Sent: Tue 6/7/2005 2:54 PM
> To: Ye, Bin
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] goodness-of-fit for limma
> Hi, Bin
>> I'm doing some affy microarray analysis using limma, and I'm not a
>> statistician. I was told that I need to check if the model fits the
>> before get the significant gene lists. So how should I do it in limma?
>> is it really necessary? If not, why?
> From a statistical viewpoint, only when the model ( for example, linear
> model used in limma) is a good analogy of the true data generating
> mechanism, the results ( like differential genes found) are valid. The
> common check is residual plots, to see whether the residual ( difference
> between the true value and fitted value) satisfy the assumption. You would
> extract residuals from lmFit.
> However, I don't see many people doing this when identifying genes. If you
> worry about that linear model might not explain the data well, you can go
> to some other non-parametric methods, like RankProd and siggenes.
> Hopu this will help
>> Thanks a lot!
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
> Fangxin Hong Ph.D.
> Plant Biology Laboratory
> The Salk Institute
> 10010 N. Torrey Pines Rd.
> La Jolla, CA 92037
> E-mail: fhong at salk.edu
> (Phone): 858-453-4100 ext 1105
Fangxin Hong Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu
(Phone): 858-453-4100 ext 1105
More information about the Bioconductor