[BioC] effect of normalization on analysis of differential knockdown
Wolfgang Huber
whuber at embl.de
Mon Jul 20 12:12:36 CEST 2009
Hi Naomi,
of course normalisation is useful. I want to point out the importance of
complementing it by quality assessment & control.
Just comparing different normalisation 'black boxes' on the basis of
resulting hit lists (of which there seemed a hint in the original post,
and which has all too often been done with microarray data in this
community) is less advisable.
Best wishes
Wolfgang
Naomi Altman ha scritto:
> Why would you bother to normalize if it did not affect the results of
> the analysis? The purpose of normalization is to dampen some of the
> noise so that the
> signal (i.e. differential expression) is clearer. The normalization
> method can have a huge effect, depending on how much noise there was in
> the experiment, and
> whether the assumptions underlying the normalization are met.
> I am not familiar with B-score normalization. Normalization to the
> median of a particular treatment or control makes sense if you expect
> the median of all the samples to be the same except for noise. If not,
> e.g. if there is down-regulation but no up-regulation, then you are
> inducing signal by normalizing.
> --Naomi
>> your t, p, q value computation seems reasonable to me. You may want to
>> choose a regularised version of the t-test (like in limma's eBayes)
>> since with only 4 samples, you may otherwise get an unnecessarily
>> large fraction of false discoveries due to the sample variance being
>> small (and t large) by chance.
>>
>> As for your question about the choice of normalisation method one
>> (perhaps not too constructive, but not ignorable) possible answer is
>> that the technical or biological variability ("noise") in your data is
>> stronger than the biological signal.
>>
>> Best wishes
>> Wolfgang
>>> Hi, I am analysing the results from a drug sensitization siRNA screen
>>> and am trying to determine which genes are being differentially
>>> knocked down (between a vehicle only run and a dosed run).
>>> Each gene is targeted by 4 siRNA's and my initial strategy has been
>>> to consider the signals from the 4 siRNA's to be individual samples
>>> for that gene. Then I perform a paired t-test on the 4 signals for a
>>> given gene across the two conditions. I then calculate Storey's
>>> q-values based on the resultant p-values.
>>> The question: does/should the normalization of the plates have an
>>> effect on the results of the above analysis? For example, I
>>> considered two normalization schemes - 1) normalizing each plate to
>>> the median of a separate negative control plate and 2) B-score
>>> normalization.
>>> If I rank the genes based on their q-values I get 2 very different
>>> rankings for the two normalization schemes. Furthermore, the q- &
>>> p-values differ greatly. In the case of median normalization I get a
>>> number of q-values < 0.05 but when using B-score I get a single gene
>>> with a q-value < 0.05 (and the next closest value is 0.58).
>>> Thinking that this study is analogous to differential expression
>>> studies in microarrays, I tried running my dataset through the SAM
>>> method (via siggenes). Using this method, the B-score normalized data
>>> leads to no hits (and a pi0 = 1) whereas the median normalization
>>> method leads to lots of hits.
>>> I can see that B-score normalized data would differ in character from
>>> median normalized data (seeing that the actual signals are replaced
>>> with scaled residuals) - but is it to be expected that normalization
>>> schemes would lead to such different results in this type of analysis?
>>> Any pointers would be appreciated.
>>> Thanks,
