[BioC] How to determine if clinical variables are responsible for gene expression with limma
Jon Manning
jmanning at staffmail.ed.ac.uk
Mon Apr 27 14:52:48 CEST 2009
Hello,
I'm new to the Bioconductor list, and fairly new to Bioconductor itself,
so excuse me if the following is a stupid question- I've been looking
around the list and documentation for a while without finding my answer.
The short version of my question is "What is the most appropriate way to
determine if microarray-derived gene expression is associated with any
of a number of continuous and discrete clinical variables, independent
of patient group/ treatment type?".
The long version, with my attempt at this analysis is as follows:
I'm currently analysing a single-channel Agilent microarray data set
involving 29 patients in three clinical groups. I've been using limma,
and think I've got the methods right for comparing those groups, like:
clinical_group <-
c(3,2,1,1,2,3,3,1,2,1,2,3,1,3,1,2,2,1,1,3,2,3,2,1,1,2,3,2,3)
design <- model.matrix(~ 0+factor(clinical_group))
colnames(design) <- c("one", "two", "three")
fit <- lmFit(esetPROC, design)
comparisons <- c("one-three", "one-two", "three-two")
contrast.matrix <- makeContrasts(contrasts=comparisons, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
....... where esetPROC is an expression set object containing normalised
and corrected expression values.
However, I also have a number of continuous and discrete clinical
variables associated with these patients. I'm interested in seeing if
any of these variables are associated with high or low gene expression.
Referring to this thread...
http://thread.gmane.org/gmane.science.biology.informatics.conductor/11402/focus=11409
... I attempted to do this with a design in limma in the following manner:
design <- model.matrix(~ 0+var1+var2+var3)
fit <- lmFit(esetPROC, design)
fit2 <- eBayes(fit)
, where var1 etc are continuous clinical variables. When using all the
variables, I get very few probes significantly associated with the
variables. However, if I employ only one variable at a time, all
variables (even non-sensical variables such as the day of the month a
patient was born) seem to produce hundreds or thousands of probes with
significant adjusted p-values. I assume this is because I'm
mis-understanding fundamentally something that's going on here (I'm not
a mathematician), and mis-applying the method.
I'd appreciate any pointers as regards where I'm going wrong here- and
where my misconceptions may lie.
Regards,
Jon Manning
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioconductor
mailing list