[BioC] Finding genes with significant linear regression or trend

Mon May 19 13:45:53 CEST 2008

Sean Davis wrote:
> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at icr.ac.uk> wrote:
>> What is the best way in bioconductor to find genes that have a
>> significant trend with a continuous variable e.g. concentration or time.
>>  This would be using microarray data and trying to find genes that show
>> a dose response or a time response.  In the simplest of cases this would
>> be a linear regression.  For example I have an experiment with time
>> points 24,48,72,96 and I would like to find genes who have expression
>> that increases with time i.e. expression is greater in each of the time
>> points.
>>
>> I have looked into trying to do this with limma but the user manual only
>> seems to deal with time courses with each time being a factor rather
>> than a continuous variable.
> 
> Limma will deal with continuous variables just fine.  Just change the
> value of the factor to a number, if you have continuous data.
> 
> genes <- matrix(rnorm(100),nc=10)
> var1 <- rnorm(10)
> df <- data.frame(var1=rnorm(10))
> dm <- model.matrix(~ var1, data=df)
> fit1 <- lmFit(genes,dm)
> fit2 <- eBayes(fit1)
> topTable(fit2,coef=2)
> 
> However, keep in mind the hypothesis you will be testing--that the
> gene expression changes are linearly correlated with the variable.
> While some genes may show this effect, there are probably plenty of
> other important and interesting genes that will not fit this model.
> The same reasoning holds for the dose-response relationship; if you
> are lucky enough (or smart enough) to be on the linear portion of the
> dose response curve for one gene, you may be very far away from linear
> for another gene.
> 
> So, to summarize, be sure that linearity is the appropriate model
> before applying it; in biology, it might very well not be the correct
> model for all genes.
> 
> Sean

Thanks for that, that's exactly what I needed.  I nearly got to the same
place by the time this email arrived, but had missed out on the coef=2
bit.  So I assume that you can fit any regression you like using this
approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~
poly(var1,2), data=df).  The only problem I see there is what coef you
would look at in the topTable, any ideas?

So to summarise:
1) Use categorical definitions of time if you want to see if there is
any change in expression with time.
2) Use regression if you want to determine whether genes have a specific
trend e.g. linear, logarithmic etc.

Just one more question.  If you had say tumour and control experiments
is there a way to see if the trends (say linear) are significantly
different?  or do contrasts in this situation not make much sense.

Thanks

-- 
**************************************************************

Daniel Brewer

Institute of Cancer Research
Molecular Carcinogenesis
MUCRC
15 Cotswold Road
Sutton, Surrey SM2 5NG
United Kingdom

Tel: +44 (0) 20 8722 4109
Fax: +44 (0) 20 8722 4141

Email: daniel.brewer at icr.ac.uk

**************************************************************

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}