[BioC] effect of normalization on analysis of differential knockdown
Wolfgang Huber
whuber at embl.de
Mon Jul 20 12:12:36 CEST 2009
Hi Naomi,
of course normalisation is useful. I want to point out the importance of
complementing it by quality assessment & control.
Just comparing different normalisation 'black boxes' on the basis of
resulting hit lists (of which there seemed a hint in the original post,
and which has all too often been done with microarray data in this
community) is less advisable.
Best wishes
Wolfgang
-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
-------------------------------------------------------
Naomi Altman ha scritto:
> Why would you bother to normalize if it did not affect the results of
> the analysis? The purpose of normalization is to dampen some of the
> noise so that the
> signal (i.e. differential expression) is clearer. The normalization
> method can have a huge effect, depending on how much noise there was in
> the experiment, and
> whether the assumptions underlying the normalization are met.
>
> I am not familiar with B-score normalization. Normalization to the
> median of a particular treatment or control makes sense if you expect
> the median of all the samples to be the same except for noise. If not,
> e.g. if there is down-regulation but no up-regulation, then you are
> inducing signal by normalizing.
>
> --Naomi
>
> At 05:14 PM 7/18/2009, Wolfgang Huber wrote:
>
>> Hi Rajarshi
>>
>> your t, p, q value computation seems reasonable to me. You may want to
>> choose a regularised version of the t-test (like in limma's eBayes)
>> since with only 4 samples, you may otherwise get an unnecessarily
>> large fraction of false discoveries due to the sample variance being
>> small (and t large) by chance.
>>
>> As for your question about the choice of normalisation method one
>> (perhaps not too constructive, but not ignorable) possible answer is
>> that the technical or biological variability ("noise") in your data is
>> stronger than the biological signal.
>>
>> Best wishes
>> Wolfgang
>>
>>
>> Rajarshi Guha wrote:
>>> Hi, I am analysing the results from a drug sensitization siRNA screen
>>> and am trying to determine which genes are being differentially
>>> knocked down (between a vehicle only run and a dosed run).
>>> Each gene is targeted by 4 siRNA's and my initial strategy has been
>>> to consider the signals from the 4 siRNA's to be individual samples
>>> for that gene. Then I perform a paired t-test on the 4 signals for a
>>> given gene across the two conditions. I then calculate Storey's
>>> q-values based on the resultant p-values.
>>> The question: does/should the normalization of the plates have an
>>> effect on the results of the above analysis? For example, I
>>> considered two normalization schemes - 1) normalizing each plate to
>>> the median of a separate negative control plate and 2) B-score
>>> normalization.
>>> If I rank the genes based on their q-values I get 2 very different
>>> rankings for the two normalization schemes. Furthermore, the q- &
>>> p-values differ greatly. In the case of median normalization I get a
>>> number of q-values < 0.05 but when using B-score I get a single gene
>>> with a q-value < 0.05 (and the next closest value is 0.58).
>>> Thinking that this study is analogous to differential expression
>>> studies in microarrays, I tried running my dataset through the SAM
>>> method (via siggenes). Using this method, the B-score normalized data
>>> leads to no hits (and a pi0 = 1) whereas the median normalization
>>> method leads to lots of hits.
>>> I can see that B-score normalized data would differ in character from
>>> median normalized data (seeing that the actual signals are replaced
>>> with scaled residuals) - but is it to be expected that normalization
>>> schemes would lead to such different results in this type of analysis?
>>> Any pointers would be appreciated.
>>> Thanks,
>>> -------------------------------------------------------------------
>>> Rajarshi Guha <rajarshi.guha at gmail.com>
>>> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84
>>> -------------------------------------------------------------------
>>> Q: What's polite and works for the phone company?
>>> A: A deferential operator.
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>>
>> Best wishes
>> Wolfgang
>>
>> -------------------------------------------------------
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
More information about the Bioconductor
mailing list