[BioC] loess and duplicate correlation
hazelett at uoneuro.uoregon.edu
Thu Mar 31 19:33:12 CEST 2005
Thanks, that clears up the issue for me. I had a hunch this might be the
case but I wanted to hear it from someone who understands the statistics
better than I. The reason I say I'm "wary of normalization" isn't that I
dispute the evidence that it removes unwanted variation inherent in the
technology. It is mainly because I'm wary of my ability to apply it
Gordon Smyth wrote:
>> Date: Wed, 30 Mar 2005 10:55:50 -0800
>> From: Dennis Hazelett <hazelett at uoneuro.uoregon.edu>
>> Subject: [BioC] loess and duplicate correlation
>> To: bioconductor at stat.math.ethz.ch
>> Hello bioconductors,
>> I fit a linear model to my data with 3 coefficients. I used loess
>> normalization on genepix data with no background correction. With my
>> data set, loess normalization resulted in slight reductions in p values
>> (relative to "median" normalization for example) and reordering of the
>> lists of DE genes for all three coefficients, which I took to be a good
>> sign. I also have a series of replicate spots, and running
>> duplicateCorrelation and including the consensus correlation (~0.55)
>> term in my linear fit further improved the p values and resulted in some
>> changes in the lists of DE genes. All of this suggests to me that loess
>> and duplicate correlation served to reduce the estimate of variance in
>> gene expression and weed out artifacts.
> Actually the two processes have different purposes. Loess
> normalization reduces the residual variability. Duplicate correlation
> does not do this, rather it assesses the residual variability more
> realistically -- p-values may go up or down as a consequence.
>> However because I'm a little wary of normalisation,
> Given the enormous weight of evidence showing that microarray data
> needs to be normalised, I'm wary of unnormalized data.
>> I took my raw data
>> set, non-normalized and non-background corrected and ran
>> duplicateCorrelation on it. For un-normalized data the consensus
>> correlation is ~0.73, quite a bit higher than for the loess-normalized
> Effective normalisation improves the consistency of results between
> arrays, and hence the duplicate correlation, which measures the
> similarity between arrays to that between arrays, will tend to
> decrease. This is to be expected.
>> After running the same lmFit model with this data set I once again
>> obtained different lists of DE genes, with many of the strongest
>> conclusions carrying over, giving me confidence that I applied the
>> correct methods and function calls. My question is, should I be
>> suspicious of the normalized data set? Am I at significant risk of
>> generating large numbers of artifactual DE genes?
> You haven't stated any reason for suspicion -- you seem to have had
> only good experience -- so it is hard to know what further to say.
More information about the Bioconductor