[BioC] effect of normalization on analysis of differential knockdown

Naomi Altman naomi at stat.psu.edu
Sun Jul 19 05:14:20 CEST 2009

Why would you bother to normalize if it did not affect the results of 
the analysis?  The purpose of normalization is to dampen some of the 
noise so that the
signal (i.e. differential expression) is clearer.  The normalization 
method can have a huge effect, depending on how much noise there was 
in the experiment, and
whether the assumptions underlying the normalization are met.

I am not familiar with B-score normalization.  Normalization to the 
median of a particular treatment or control makes sense if you expect 
the median of all the samples to be the same except for noise.  If 
not, e.g. if there is down-regulation but no up-regulation, then you 
are inducing signal by normalizing.


At 05:14 PM 7/18/2009, Wolfgang Huber wrote:

>Hi Rajarshi
>your t, p, q value computation seems reasonable to me. You may want 
>to choose a regularised version of the t-test (like in limma's 
>eBayes) since with only 4 samples, you may otherwise get an 
>unnecessarily large fraction of false discoveries due to the sample 
>variance being small (and t large) by chance.
>As for your question about the choice of normalisation method one 
>(perhaps not too constructive, but not ignorable) possible answer is 
>that the technical or biological variability ("noise") in your data 
>is stronger than the biological signal.
>         Best wishes
>         Wolfgang
>Rajarshi Guha wrote:
>>Hi, I am analysing the results from a drug sensitization siRNA 
>>screen and am trying to determine which genes are being 
>>differentially knocked down (between a vehicle only run and a dosed run).
>>Each gene is targeted by 4 siRNA's and my initial strategy has been 
>>to consider the signals from the 4 siRNA's to be individual samples 
>>for that gene. Then I perform a paired t-test on the 4 signals for 
>>a given gene across the two conditions. I then calculate Storey's 
>>q-values based on the resultant p-values.
>>The question: does/should the normalization of the plates have an 
>>effect on the results of the above analysis? For example, I 
>>considered two normalization schemes - 1) normalizing each plate to 
>>the median of a separate negative control plate and 2) B-score normalization.
>>If I rank the genes based on their q-values I get 2 very different 
>>rankings for the two normalization schemes. Furthermore, the q- & 
>>p-values differ greatly. In the case of median normalization I get 
>>a number of q-values < 0.05 but when using B-score I get a single 
>>gene with a q-value < 0.05 (and the next closest value is 0.58).
>>Thinking that this study is analogous to differential expression 
>>studies in microarrays, I tried running my dataset through the SAM 
>>method (via siggenes). Using this method, the B-score normalized 
>>data leads to no hits (and a pi0 = 1) whereas the median 
>>normalization method leads to lots of hits.
>>I can see that B-score normalized data would differ in character 
>>from median normalized data (seeing that the actual signals are 
>>replaced with scaled residuals) - but is it to be expected that 
>>normalization schemes would lead to such different results in this 
>>type of analysis?
>>Any pointers would be appreciated.
>>Rajarshi Guha  <rajarshi.guha at gmail.com>
>>GPG Fingerprint: D070 5427 CC5B 7938 929C  DD13 66A1 922C 51E7 9E84
>>Q:  What's polite and works for the phone company?
>>A:  A deferential operator.
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>Search the archives: 
>Best wishes
>      Wolfgang
>Wolfgang Huber
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives: 

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list