[BioC] Differential expression

Wed May 31 14:30:28 CEST 2006

Hi all,

I think there are two general issues here:

1. The risk of over-normalizing and washing out biologically real
differences is the larger, the more flexible the normalization method
is, i.e. the larger its (effective) number of parameters is. E.g. if you
use loess, a large bandwidth (small number of effective parameters) is
better. The vsn method uses 2 parameters per array (scale factor and
offset). I don't know whether or how one can quantify the effective
number of parameters for quantile normalization - does anyone?

2. Then remains the question of how to estimate the normalization
parameters correctly. There are two guidelines:
- use a robust method with a high breakdown point (the goal being that
it is not affected by the differentially expressed genes = outliers)
- if at all possible, try to narrow down the probes from which you fit
the parameters from all genes (incl.the differential ones) to a subset
which are enriched for non-changing (e.g. spike-ins, or known negative
controls)

[ There is some discussion of the robustness issue (incl. simulations to
show the effect of unbalanced numbers of "ups" and "downs") in
Stat Appl Genet Mol Biol. 2003;2(1):Article3. PMID: 16646781 ]

Best wishes
 Wolfgang.

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

Naomi Altman wrote:
> I cannot imagine that cyclic loess would be any better or worse than 
> RMA if there is substantial differential expression.
> 
> The problems really start before normalization.  Usually samples are 
> prepared for hybridization so that the total RNA or mRNA are about 
> equal concentration.  So the mean on each array should be about the 
> same, even if there is substantially more down (up) regulation in one 
> condition.  If the samples are prepared so that they include the same 
> number of cells, you have some hope of observing bulk up or down 
> regulation - assuming that you can obtain the same yield/mRNA/cell.
> 
> --Naomi
> 
> At 05:54 PM 5/30/2006, Jenny Drnevich wrote:
>> Hi Naomi & others,
>>
>> This is the first I've heard (that I remember...) that loess normalization
>> is less susceptible to problems with a large percentage (~ >30%) of genes
>> changing. I'm wondering if this would be an OK method to use when comparing
>> different tissue types. I don't think there is any reason to expect the
>> average regulation up vs. down to be any different, although I could check
>> this by comparing mean expression level between the tissues at various
>> spans of A... I'm interested in this generally, although right now I'm
>> working on a set of Affy arrays. There is a cyclical loess normalization
>> available, and I'd like to get any opinions as to whether this might be a
>> better way to go than quantile normalizing the tissues separately, if I
>> want to find expression differences between tissues.
>>
>> Thanks,
>> Jenny
>>
>> At 12:05 PM 5/26/2006, Naomi Altman wrote:
>>> In many experiments, a large number of genes differentially express.
>>> Loess normalization will continue to work reasonably well if, at each
>>> level of intensity "A" the the average up and down regulation are
>>> about equal.  However, you would probably not want to count on this
>>> on small regions of the array, so it would probably be best to use a
>>> whole-array loess, rather than print-tip loess if many tips were used.
>>>
>>> The problem, of course, is to understand if the average up and down
>>> regulation are about equal.  E.g. if you are looking at transcription
>>> factor mutants, this would be a very bad assumption.
>>>
>>> --Naomi
>>>
>>> At 11:25 AM 5/26/2006, Kimpel, Mark William wrote:
>>>> Makis,
>>>>
>>>> I am speaking as a biologist, not as a statistician. Under conditions of
>>>> most biologic experiments, the assumption is that cells need to continue
>>>> mundane "housekeeping" functions and that these are minimally effected
>>>> by the differential conditions of the experiment. In my area, which is
>>>> neuroscience, we hope to see differential expression of genes involved
>>>> with neurotransmission or synaptic plasticity, but do not expect to see
>>>> differential expression of genes involved in just keeping neurons and
>>>> support cells alive and intact. It turns out that most genes are
>>>> involved in the latter, not the former, processes. We occasionally see
>>>> examples on this list, however, where very drastic experimental
>>>> conditions, such as one might see in toxicology, lead to differential
>>>> expression of a larger percentage of genes.
>>>>
>>>> It is important, then, to put your experiment into biologic context to
>>>> consider whether your current findings make sense and how best to
>>>> proceed with normalization and analysis. For instance, normalization
>>>> techniques that make sense when only a small percentage of genes are
>>>> differentially expressed may not be appropriate when a much large
>>>> percentage of genes are differentially expressed (and I'll let the
>>>> statisticians on this list address what those procedures are and how to
>>>> decide which to use when).
>>>>
>>>> If you would, you might describe for the list the context of your
>>>> experiment so that others might know how best to advise you to proceed.
>>>>
>>>> Mark
>>>>
>>>> Mark W. Kimpel MD
>>>>
>>>> Indiana University School of Medicine
>>>>
>>>> .ch] On Behalf Of E Motakis, Mathematics
>>>> Sent: Friday, May 26, 2006 11:07 AM
>>>> To: Bioconductor
>>>> Subject: [BioC] Differential expression
>>>>
>>>> Dear all,
>>>>
>>>> I am working on two colours microarray experiments and, from a set of
>>>> 42000
>>>> genes, I would like to identify the differentially expressed ones. I
>>>> have
>>>> read several articles on this issue and most of them imply that the
>>>> number
>>>> of differential expressed genes in such experiments should be a small
>>>> number (compared to the whole set).
>>>>
>>>> Could anyone tell me why this is correct? What if I find half of the
>>>> genes
>>>> to be differentially expressed according to the t-test p-value?
>>>>
>>>> I am not discussing the issue of p-values and q-values yet. I am asking
>>>> only about why most of the papers imply a low number of differentially
>>>> expressed genes.
>>>>
>>>> Thank you,
>>>> Makis
>>>>
>>>>
>>>> ----------------------
>>>> E Motakis, Mathematics
>>>> E.Motakis at bristol.ac.uk
>>>>