[BioC] reproducing dChip expression measure

Wed Apr 13 19:36:37 CEST 2005

On Wed, Apr 13, 2005 at 02:42:37PM +0100, Adaikalavan Ramasamy wrote:
> Dear Naomi, thank you for the response. Please see my response.
> 
> 
> On Mon, 2005-04-11 at 19:14 -0400, Naomi Altman wrote:
> > I think you will find that any 2 reasonable Affy normalization methods have 
> 
> I am comparing the same expression measure (li-wong) but by two
> different softwares (dChip and BioConductor).
> 
> > very high correlation.  In the Irizarry et al paper on cross-lab and 
> > cross-platform comparisons this is called the "probe effect" and is due to 
> > the fact that the range of expression values is huge and the normalization 
> > methods do a reasonable job of preserving the ordering.
> > However, this correlation does not translate into much overlap in the set 
> > of genes that are declared DE.
> 
> Very interesting paper indeed. Thank you for pointing out this. I will
> need to read it more on it though.
> 
> > A better measure of closeness of the 2 normalizations is the MA plot of the 
> > normalized values on the same array, using the 2 normalizations.
> 
> The MA plot is simply 45 degree rotation of the scatter plots, so I
> prefer to look at the scatterplots directly. True, I should have done

That is simply a wrong preference. While I agree that the two plots 
contains the same mathematical object, the same can be said if I 
produced a plot with extremely skewed axises. Far too many scientist 
(statisticians included) tends to think that if two plots contains the 
same numbers, they are equivalent.

We (humans) generally find it much easier to gauge horizontal and 
vertical lines. One of the principal tasks in a MvA plot is to see if it 
corresponds to a line or if there is any systematic deviance from this. 
And when you have to make that judgement, it is much easier to do 
(correctly) on the basis of a MvA plot.

Trye eg. to make a simple linear regression. Think of two plots
1) you plot the points and the fitted line
2) you plot the residuals
While the residuals are easy to see on plot 1, plot 2 is much better for 
assessing them.

Kasper

> the scatterplot on an array-by-array basis but I am not too keen on
> looking at 48 (= 12 arrays x 4 ways ) plots.
> 
> > Incidentally, I have never used the Li-Wong method, but I understand that 
> > it requires a fairly large data set (i.e. arrays/condition), so the 
> > differences between dChip and BioC may just be failure to converge.
> 
> Very good point. I did not even consider this. I wonder how the stable
> expression measures is under different runs within R itself.
> 
> > --Naomi
> > 
> > At 11:01 AM 4/7/2005, Adaikalavan Ramasamy wrote:
> > >I am trying to reproduce the dChip expression measure from the dChip
> > >software with BioConductor packages. I am aware that dChip is not open
> > >source but I would like to get as close as I can. Thus, I compare the
> > >dChip expression measure from both softwares applied on a small datasets
> > >of 12 arrays with approximately 16000 probesets.
> > >
> > >Going through mailing archive I found that I can use the following
> > >combinations of values for parameters to feed through expresso
> > >
> > >         model   pmcorrect.method   bgcorrect.method
> > >         1        "pmonly"            "none"
> > >         2        "subtractmm"       "none"
> > >         3         "pmonly"           "mas"
> > >         4         "subtractmm"       "mas"
> > >
> > >with the following generic incantation to expresso :
> > >
> > >   expresso( ReadAffy(), normalize.method="invariantset",
> > >             bgcorrect.method=???, pmcorrect.method=???,
> > >             summary.method="liwong"
> > >           )
> > >
> > >
> > >The correlation of the values are high and similar ( around 0.90 ). I
> > >ahve attached both the scatterplot and hexbin of expression measures
> > >from these two softwares under different models with the line of
> > >identity in red. It suggests that :
> > >
> > >a) Majority of the values are concentrated in the lower regions
> > >b) The appears to be highly correlated values at higher end but they are
> > >not perfectly identical
> > >c) the MM subtracted data gives more dis-agreement at lower range but
> > >much closer to line of identity at higher range
> > >d) mas5 background correction does not appear to make much difference
> > >
> > >
> > >Can other members of the list comment on
> > >a) if they seen similar findings
> > >b) if these results are expected and sensibility
> > >c) what else can I try to increase the reproducibility
> > >
> > >
> > >Eventually I plan on applying BioConductor's version of dChip expression
> > >measure to few other datasets, so it would be useful to use the most
> > >reproducible version from BioConductor.
> > >
> > >Thank you very much.
> > >
> > >Regards, Adai
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > 
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348 (Statistics)
> > University Park, PA 16802-2111
> > 
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen