[BioC] reproducing dChip expression measure

Naomi Altman naomi at stat.psu.edu
Tue Apr 12 01:14:46 CEST 2005

I think you will find that any 2 reasonable Affy normalization methods have 
very high correlation.  In the Irizarry et al paper on cross-lab and 
cross-platform comparisons this is called the "probe effect" and is due to 
the fact that the range of expression values is huge and the normalization 
methods do a reasonable job of preserving the ordering.
However, this correlation does not translate into much overlap in the set 
of genes that are declared DE.

A better measure of closeness of the 2 normalizations is the MA plot of the 
normalized values on the same array, using the 2 normalizations.

Incidentally, I have never used the Li-Wong method, but I understand that 
it requires a fairly large data set (i.e. arrays/condition), so the 
differences between dChip and BioC may just be failure to converge.


At 11:01 AM 4/7/2005, Adaikalavan Ramasamy wrote:
>I am trying to reproduce the dChip expression measure from the dChip
>software with BioConductor packages. I am aware that dChip is not open
>source but I would like to get as close as I can. Thus, I compare the
>dChip expression measure from both softwares applied on a small datasets
>of 12 arrays with approximately 16000 probesets.
>Going through mailing archive I found that I can use the following
>combinations of values for parameters to feed through expresso
>         model   pmcorrect.method   bgcorrect.method
>         1        "pmonly"            "none"
>         2        "subtractmm"       "none"
>         3         "pmonly"           "mas"
>         4         "subtractmm"       "mas"
>with the following generic incantation to expresso :
>   expresso( ReadAffy(), normalize.method="invariantset",
>             bgcorrect.method=???, pmcorrect.method=???,
>             summary.method="liwong"
>           )
>The correlation of the values are high and similar ( around 0.90 ). I
>ahve attached both the scatterplot and hexbin of expression measures
>from these two softwares under different models with the line of
>identity in red. It suggests that :
>a) Majority of the values are concentrated in the lower regions
>b) The appears to be highly correlated values at higher end but they are
>not perfectly identical
>c) the MM subtracted data gives more dis-agreement at lower range but
>much closer to line of identity at higher range
>d) mas5 background correction does not appear to make much difference
>Can other members of the list comment on
>a) if they seen similar findings
>b) if these results are expected and sensibility
>c) what else can I try to increase the reproducibility
>Eventually I plan on applying BioConductor's version of dChip expression
>measure to few other datasets, so it would be useful to use the most
>reproducible version from BioConductor.
>Thank you very much.
>Regards, Adai
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list