[BioC] effect of normalization on analysis of differential knockdown

Wolfgang Huber huber at ebi.ac.uk
Sat Jul 18 23:14:20 CEST 2009


Hi Rajarshi

your t, p, q value computation seems reasonable to me. You may want to 
choose a regularised version of the t-test (like in limma's eBayes) 
since with only 4 samples, you may otherwise get an unnecessarily large 
fraction of false discoveries due to the sample variance being small 
(and t large) by chance.

As for your question about the choice of normalisation method one 
(perhaps not too constructive, but not ignorable) possible answer is 
that the technical or biological variability ("noise") in your data is 
stronger than the biological signal.

	Best wishes
	Wolfgang


Rajarshi Guha wrote:
> Hi, I am analysing the results from a drug sensitization siRNA screen 
> and am trying to determine which genes are being differentially knocked 
> down (between a vehicle only run and a dosed run).
> 
> Each gene is targeted by 4 siRNA's and my initial strategy has been to 
> consider the signals from the 4 siRNA's to be individual samples for 
> that gene. Then I perform a paired t-test on the 4 signals for a given 
> gene across the two conditions. I then calculate Storey's q-values based 
> on the resultant p-values.
> 
> The question: does/should the normalization of the plates have an effect 
> on the results of the above analysis? For example, I considered two 
> normalization schemes - 1) normalizing each plate to the median of a 
> separate negative control plate and 2) B-score normalization.
> 
> If I rank the genes based on their q-values I get 2 very different 
> rankings for the two normalization schemes. Furthermore, the q- & 
> p-values differ greatly. In the case of median normalization I get a 
> number of q-values < 0.05 but when using B-score I get a single gene 
> with a q-value < 0.05 (and the next closest value is 0.58).
> 
> Thinking that this study is analogous to differential expression studies 
> in microarrays, I tried running my dataset through the SAM method (via 
> siggenes). Using this method, the B-score normalized data leads to no 
> hits (and a pi0 = 1) whereas the median normalization method leads to 
> lots of hits.
> 
> I can see that B-score normalized data would differ in character from 
> median normalized data (seeing that the actual signals are replaced with 
> scaled residuals) - but is it to be expected that normalization schemes 
> would lead to such different results in this type of analysis?
> 
> Any pointers would be appreciated.
> 
> Thanks,
> 
> -------------------------------------------------------------------
> Rajarshi Guha  <rajarshi.guha at gmail.com>
> GPG Fingerprint: D070 5427 CC5B 7938 929C  DD13 66A1 922C 51E7 9E84
> -------------------------------------------------------------------
> Q:  What's polite and works for the phone company?
> A:  A deferential operator.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 

Best wishes
      Wolfgang

-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list