[BioC] Finding genes with significant linear regression or trend

Tue May 20 16:02:09 CEST 2008

As Sean said, depending on how many time points you have,  a linear 
response may not be the best model. In fact, it is unclear that any 
parametric model will work across all the genes in the data set. 
Typically, you tend to be interested in genes that "increase over time" 
or "decrease over time", without caring about the exact shape of the 
function.  Similar ideas apply to dose response studies. You might want 
to consider an approach using isotonic regression (which addresses 
exactly this question), as described for microarrays in the paper:

Hu J, Kapoor M, Zhang W, Hamilton SR, Coombes KR. Analysis of 
dose-response effects on gene expression data with comparison of two 
microarray platforms. Bioinformatics. 2005 Sep 1;21(17):3524-9.

If you're interested, you can contact Jianhua Hu and ask her for the 
code....

Best,
	Kevin

Daniel Brewer wrote:
> 
> Sean Davis wrote:
>> On Mon, May 19, 2008 at 6:39 AM, Daniel Brewer <daniel.brewer at icr.ac.uk> wrote:
>>> What is the best way in bioconductor to find genes that have a
>>> significant trend with a continuous variable e.g. concentration or time.
>>>  This would be using microarray data and trying to find genes that show
>>> a dose response or a time response.  In the simplest of cases this would
>>> be a linear regression.  For example I have an experiment with time
>>> points 24,48,72,96 and I would like to find genes who have expression
>>> that increases with time i.e. expression is greater in each of the time
>>> points.
>>>
>>> I have looked into trying to do this with limma but the user manual only
>>> seems to deal with time courses with each time being a factor rather
>>> than a continuous variable.
>> Limma will deal with continuous variables just fine.  Just change the
>> value of the factor to a number, if you have continuous data.
>>
>> genes <- matrix(rnorm(100),nc=10)
>> var1 <- rnorm(10)
>> df <- data.frame(var1=rnorm(10))
>> dm <- model.matrix(~ var1, data=df)
>> fit1 <- lmFit(genes,dm)
>> fit2 <- eBayes(fit1)
>> topTable(fit2,coef=2)
>>
>> However, keep in mind the hypothesis you will be testing--that the
>> gene expression changes are linearly correlated with the variable.
>> While some genes may show this effect, there are probably plenty of
>> other important and interesting genes that will not fit this model.
>> The same reasoning holds for the dose-response relationship; if you
>> are lucky enough (or smart enough) to be on the linear portion of the
>> dose response curve for one gene, you may be very far away from linear
>> for another gene.
>>
>> So, to summarize, be sure that linearity is the appropriate model
>> before applying it; in biology, it might very well not be the correct
>> model for all genes.
>>
>> Sean
> 
> Thanks for that, that's exactly what I needed.  I nearly got to the same
> place by the time this email arrived, but had missed out on the coef=2
> bit.  So I assume that you can fit any regression you like using this
> approach e.g. if you wanted to fit a quadratic dm <- model.matrix(~
> poly(var1,2), data=df).  The only problem I see there is what coef you
> would look at in the topTable, any ideas?
> 
> So to summarise:
> 1) Use categorical definitions of time if you want to see if there is
> any change in expression with time.
> 2) Use regression if you want to determine whether genes have a specific
> trend e.g. linear, logarithmic etc.
> 
> Just one more question.  If you had say tumour and control experiments
> is there a way to see if the trends (say linear) are significantly
> different?  or do contrasts in this situation not make much sense.
> 
> Thanks
>