[BioC] problem normalizeMedians (limma)

Gordon Smyth smyth at wehi.edu.au
Wed Oct 19 02:19:20 CEST 2005


>>Date: Mon, 17 Oct 2005 18:24:47 -0400
>>From: Francois Pepin <fpepin at cs.mcgill.ca>
>>Subject: [BioC] problem normalizeMedians (limma)
>>To: bioconductor at stat.math.ethz.ch
>>
>>Hi everyone,
>>
>>I'm not sure what should be done about it, but I've been surprised with
>>the following behavior with normalizeBetweenArrays:
>>
>>(dat is an MAList here)
>> > dat$A[1,1]
>>[1] 8.796361
>> > dat <- normalizeBetweenArrays(dat)
>> > dat$A[1,1]
>>[1] Inf
>>
>>This is due to the fact that I've got almost 400 arrays here. The
>>normalizeMedians(x) (which is called by normalizeBetweenArrays) has the
>>following behavior:
>>
>>     a.med <- apply(x, 2, median, na.rm = TRUE)
>>     a.med <- a.med/(prod(a.med))^(1/narrays)
>>
>>multiplying 400 relatively small numbers can easily reach R's limit as
>>far as doubles are concerned.
>>
>> > 6^400
>>[1] Inf
>>
>>Wouldn't it be better to be working in log space instead?

Yes, you're right. In limma 2.3.2 I will change

    (prod(a.med))^(1/narrays)

to

    exp(mean(log(a.med)))

which should make floating underflow very unlikely.

BTW, the "scale" method has been the default for normalizeBetweenArrays() 
for historic reasons, inherited from the older sma package. I'm going to 
change the default to "Aquantile".

Gordon

>>Francois
>>
>> > sessionInfo()
>>R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu
>>
>>attached base packages:
>>[1] "methods"   "stats"     "graphics"  "grDevices" "utils"
>>"datasets"
>>[7] "base"
>>
>>other attached packages:
>>   limma
>>"1.9.6"



More information about the Bioconductor mailing list