[BioC] when do linear models work?

Thu Mar 4 19:48:57 MET 2004

Hello All,

I've two fundamental problems with linear models (lm), maybe you can help me
to clearify these issues:

1. Irrespective of how many factors you use in your expriment, the
relationship is always assumed to be linear. If you've a response vector Y
and vector X of indeppendent variables, the Y ~ X basically assumes a
straight line (with some kind of slope). If you do say Y ~ X + Z then one can
think of the lm as a *flat* surface. The same is true for higher dimensions
(X ~ dose + time + batch + gender + ... )

This assumtion is realy dangerous I think, since many treatment/response
relationships are not linear. For example think about an experiment: I've 5
doses 0.0mM, 0.10mM, 0.25mM, 0.5mM and 1.0mM of a drug with which cell
cultures get treated. The 0.1mM dose causes hardly any change in gene
expression, whereas there's a big difference in gene expression at 0.25mM.
Then at 0.5mM and 1.0mM the reponse is not much stronger than at 0.25mM. 

If one just looks at a single gene, then expression of this gene goes up
quite strongly from 0.1mM to 0.25mM, and then expression flattens out for the
higher doses. The response reaches saturation. Other resposnes are more like
a logistic curve. This is a typical scenario.

The problem is that many genes within one experiment behave like described
above, otheres change linear others exponetial ...

Could I still use lm for this kind of experiment? Would I've to decide on a
gene by gene basis?

2. Some of the factors such as treament (T) for an experiment can only take
say 2 distinct values: treated (t) and untreated (ut). Does a model such as Y
~ T make any sense in this case?

Doesn't this assume a linear relationship between just 2 "clouds" of data
(assume there are many samples for each factor level)? Even if one can
clearly distinguish between t and ut - assuming a straight line may wrong.
This is like drawing a straight line between two points. Just like in my
example above with the different doses, you may have already reached some
kind of saturation. Using such a model for prediction would then give wrong
results.

However, if one just wants to distinguish between t and ut, would the lm be a
valid method?

I'm reading some "beginners" literature about lm's, and I'm just trying to
understand what's going on ... .

Maybe you could comment on this. I'd be very interested in any explanation or
clearification.

	kind regards,

	Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com