[BioC] Finding genes with significant linear regression or trend

Daniel Brewer daniel.brewer at icr.ac.uk
Mon May 19 13:45:53 CEST 2008

Sean Davis wrote:
> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at icr.ac.uk> wrote:
>> What is the best way in bioconductor to find genes that have a
>> significant trend with a continuous variable e.g. concentration or time.
>>  This would be using microarray data and trying to find genes that show
>> a dose response or a time response.  In the simplest of cases this would
>> be a linear regression.  For example I have an experiment with time
>> points 24,48,72,96 and I would like to find genes who have expression
>> that increases with time i.e. expression is greater in each of the time
>> points.
>> I have looked into trying to do this with limma but the user manual only
>> seems to deal with time courses with each time being a factor rather
>> than a continuous variable.
> Limma will deal with continuous variables just fine.  Just change the
> value of the factor to a number, if you have continuous data.
> genes <- matrix(rnorm(100),nc=10)
> var1 <- rnorm(10)
> df <- data.frame(var1=rnorm(10))
> dm <- model.matrix(~ var1, data=df)
> fit1 <- lmFit(genes,dm)
> fit2 <- eBayes(fit1)
> topTable(fit2,coef=2)
> However, keep in mind the hypothesis you will be testing--that the
> gene expression changes are linearly correlated with the variable.
> While some genes may show this effect, there are probably plenty of
> other important and interesting genes that will not fit this model.
> The same reasoning holds for the dose-response relationship; if you
> are lucky enough (or smart enough) to be on the linear portion of the
> dose response curve for one gene, you may be very far away from linear
> for another gene.
> So, to summarize, be sure that linearity is the appropriate model
> before applying it; in biology, it might very well not be the correct
> model for all genes.
> Sean

Thanks for that, that's exactly what I needed.  I nearly got to the same
place by the time this email arrived, but had missed out on the coef=2
bit.  So I assume that you can fit any regression you like using this
approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~
poly(var1,2), data=df).  The only problem I see there is what coef you
would look at in the topTable, any ideas?

So to summarise:
1) Use categorical definitions of time if you want to see if there is
any change in expression with time.
2) Use regression if you want to determine whether genes have a specific
trend e.g. linear, logarithmic etc.

Just one more question.  If you had say tumour and control experiments
is there a way to see if the trends (say linear) are significantly
different?  or do contrasts in this situation not make much sense.



Daniel Brewer

Institute of Cancer Research
Molecular Carcinogenesis
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom

Tel: +44 (0) 20 8722 4109
Fax: +44 (0) 20 8722 4141

Email: daniel.brewer at icr.ac.uk


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

More information about the Bioconductor mailing list