[BioC] loess and duplicate correlation

Thu Mar 31 19:33:12 CEST 2005

Hi Gordon,
Thanks, that clears up the issue for me. I had a hunch this might be the 
case but I wanted to hear it from someone who understands the statistics 
better than I. The reason I say I'm "wary of normalization" isn't that I 
dispute the evidence that it removes unwanted variation inherent in the 
technology. It is mainly because I'm wary of my ability to apply it 
correctly. ;-)
-d

Gordon Smyth wrote:

>
>> Date: Wed, 30 Mar 2005 10:55:50 -0800
>> From: Dennis Hazelett <hazelett at uoneuro.uoregon.edu>
>> Subject: [BioC] loess and duplicate correlation
>> To: bioconductor at stat.math.ethz.ch
>>
>> Hello bioconductors,
>> I fit a linear model to my data with 3 coefficients. I used loess
>> normalization on genepix data with no background correction. With my
>> data set, loess normalization resulted in slight reductions in p values
>> (relative to "median" normalization for example) and reordering of the
>> lists of DE genes for all three coefficients, which I took to be a good
>> sign. I also have a series of replicate spots, and running
>> duplicateCorrelation and including the consensus correlation (~0.55)
>> term in my linear fit further improved the p values and resulted in some
>> changes in the lists of DE genes. All of this suggests to me that loess
>> and duplicate correlation served to reduce the estimate of variance in
>> gene expression and weed out artifacts.
>
>
> Actually the two processes have different purposes. Loess 
> normalization reduces the residual variability. Duplicate correlation 
> does not do this, rather it assesses the residual variability more 
> realistically -- p-values may go up or down as a consequence.
>
>> However because I'm a little wary of normalisation,
>
>
> Given the enormous weight of evidence showing that microarray data 
> needs to be normalised, I'm wary of unnormalized data.
>
>>  I took my raw data
>> set, non-normalized and non-background corrected and ran
>> duplicateCorrelation on it. For un-normalized data the consensus
>> correlation is ~0.73, quite a bit higher than for the loess-normalized
>> data.
>
>
> Effective normalisation improves the consistency of results between 
> arrays, and hence the duplicate correlation, which measures the 
> similarity between arrays to that between arrays, will tend to 
> decrease. This is to be expected.
>
>>  After running the same lmFit model with this data set I once again
>> obtained different lists of DE genes, with many of the strongest
>> conclusions carrying over, giving me confidence that I applied the
>> correct methods and function calls. My question is, should I be
>> suspicious of the normalized data set? Am I at significant risk of
>> generating large numbers of artifactual DE genes?
>> -Dennis
>
>
> You haven't stated any reason for suspicion -- you seem to have had 
> only good experience -- so it is hard to know what further to say.
>
> Gordon