[BioC] an error in normalize.quantiles {preprocessCore} function?

Fri Apr 16 17:39:58 CEST 2010

Without actually looking at your data, a reasonable explanation for what
you have observed would be in the handling of ties. The algorithm ensures
that values that are equal on input in a given column are also equal on
output.

> set.seed(1)
> X <- rnorm(100000)
> X <- round(X,3)  ### This creates a bunch of non-unique values.
> X <- matrix(X,ncol=10)
> library(preprocessCore)
> X.norm <- normalize.quantiles(X)
> colMeans(X.norm)
 [1] -0.00224544 -0.00224477 -0.00224519 -0.00224287 -0.00224322 -0.00224448
 [7] -0.00224846 -0.00224612 -0.00224265 -0.00224595
> set.seed(1)
> X <- rnorm(100000)
> X <- matrix(X,ncol=10)  ## no rounding here so every value is unique
> X.norm <- normalize.quantiles(X)
> colMeans(X.norm)
 [1] -0.002244083 -0.002244083 -0.002244083 -0.002244083 -0.002244083
 [6] -0.002244083 -0.002244083 -0.002244083 -0.002244083 -0.002244083

>
> Hi all,
>
>  I am using normalize.quantiles in package preprocessCore
> to deal with my data now, and when I am trying to average
> the expression value of each chip to visualize the result
> of quantile normalization, I curiously found one chip seem
> to have a different average expression value from others. I
> have uploaded the image to imageshack:
> http://img685.imageshack.us/img685/8320/mean.gif
> It is noted that the average expression value of case no.184
> is clearly away from other cases.
>
>  After checking the normalized data, I have found two
> cells which seemed should be 2.287524785 and 2.287870326
> are replaced with both 2.28769392. I am not sure what is
> causing the problem, and have tried the normalization on
> two different computers, one with R 2.9.1 and preprocessCore
> 1.6 on a x64 system, and the other with R 2.10.1 and
> preprocessCore 1.8.0 on a x86 system. However the results
> are identical. My code is as follows:
>
> library(preprocessCore)
> alldata.q=as.matrix(alldata)
> alldata.q=normalize.quantiles(alldata.q)
> alldata.q=data.frame(alldata.q)
> row.names(alldata.q)=row.names(alldata)
> names(alldata.q)=names(alldata)
> plot(mean(alldata.q))
>
> And to specify which is the different spot, I have used
> another code:
>
> mean(alldata.q)==mean(alldata.q)[1]
>
> And the result are all TRUEs, except one FALSE for case
> no.184. I am not sure if there is an error in my code, or
> really in the function itself.
>
>  In order to reproduce the error to have further information, I
> have uploaded the data somewhere else, since I think it unlikely
> possible to attach a file as big as 31mb. Please find the file in
> following url:
> http://webhd.ndmctsgh.edu.tw/invite/tw/webhd/bNzQ5OS85ODkxNi8xMjcxNDExNjEw
>
> If any further information is needed to clearify the problem, please
> let me know.
>
> kindest regards,
> Tseng, Chih-hao
>
>
> --==Mailed via NDMCTSGH Webmail==--
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>