[BioC] problem normalizeMedians (limma)

Marcus Davy mdavy at hortresearch.co.nz
Wed Oct 19 02:49:33 CEST 2005

Hi,
Doe this include modifying normalizeMedianDeviations aswell for consistency.
I suspect normalizeMedianDeviations doesn't reach R's limit as quickly.

Marcus

MA <- new("RGList")

n     <- 200
genes <- 1000

MA\$M <- matrix(runif(n*genes, -8, 8), nr=genes, nc=n)
MA\$A <- matrix(runif(n*genes, 1,16), nr=genes, nc=n)

summary(c(MA\$M))
summary(c(MA\$A))

any(is.infinite(normalizeMedians(MA\$A)))

normalizeMedians2 <- function(x)
{
narrays <- NCOL(x)
if (narrays == 1)
return(x)
a.med <- log(apply(x, 2, median, na.rm = TRUE))
a.med <- exp(a.med - mean(a.med))
t(t(x)/a.med)
}

# Subset
normalizeMedians(MA\$A[1:5,1:5])
normalizeMedians2(MA\$A[1:5,1:5])

# Entire dataset
any(is.infinite(normalizeMedians(MA\$A)))
any(is.infinite(normalizeMedians2(MA\$A)))

normalizeMedianDeviations2 <- function(x)
{
narrays <- NCOL(x)
if (narrays == 1)
return(x)
medabs <- function(x) median(abs(as.numeric(x[!is.na(x)])))
xmat.mav <- log(apply(x, 2, medabs))
denom <- mean(xmat.mav)
si <- exp(xmat.mav - denom)
t(t(x)/si)
}

normalizeMedianDeviations(MA\$M[1:5, 1:5])
normalizeMedianDeviations2(MA\$M[1:5, 1:5])

# Entire dataset
any(is.infinite(normalizeMedianDeviations(MA\$M)))
any(is.infinite(normalizeMedianDeviations2(MA\$M)))

On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at wehi.edu.au> wrote:

>
>>> Date: Mon, 17 Oct 2005 18:24:47 -0400
>>> From: Francois Pepin <fpepin at cs.mcgill.ca>
>>> Subject: [BioC] problem normalizeMedians (limma)
>>> To: bioconductor at stat.math.ethz.ch
>>>
>>> Hi everyone,
>>>
>>> I'm not sure what should be done about it, but I've been surprised with
>>> the following behavior with normalizeBetweenArrays:
>>>
>>> (dat is an MAList here)
>>>> dat\$A[1,1]
>>>  8.796361
>>>> dat <- normalizeBetweenArrays(dat)
>>>> dat\$A[1,1]
>>>  Inf
>>>
>>> This is due to the fact that I've got almost 400 arrays here. The
>>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the
>>> following behavior:
>>>
>>>     a.med <- apply(x, 2, median, na.rm = TRUE)
>>>     a.med <- a.med/(prod(a.med))^(1/narrays)
>>>
>>> multiplying 400 relatively small numbers can easily reach R's limit as
>>> far as doubles are concerned.
>>>
>>>> 6^400
>>>  Inf
>>>
>>> Wouldn't it be better to be working in log space instead?
>
> Yes, you're right. In limma 2.3.2 I will change
>
>     (prod(a.med))^(1/narrays)
>
> to
>
>     exp(mean(log(a.med)))
>
> which should make floating underflow very unlikely.
>
> BTW, the "scale" method has been the default for normalizeBetweenArrays()
> for historic reasons, inherited from the older sma package. I'm going to
> change the default to "Aquantile".
>
> Gordon
>
>>> Francois
>>>
>>>> sessionInfo()
>>> R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu
>>>
>>> attached base packages:
>>>  "methods"   "stats"     "graphics"  "grDevices" "utils"
>>> "datasets"
>>>  "base"
>>>
>>> other attached packages:
>>>   limma
>>> "1.9.6"
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

______________________________________________________

The contents of this e-mail are privileged and/or confidenti...{{dropped}}