[BioC] Estimates of RMA parameters

Antille,Nicolas,LAUSANNE,NRC-BAS nicolas.antille at rdls.nestle.com
Thu Aug 7 11:55:19 MEST 2003


During background subtraction (with RMA), it is assumed that
the the observed intensities (O) are the sum of a signal(S)
and a noise(N) :

		O = S + N

with S follows an exponential ditribution (parameter = alpha) and
N follows a normal distribution (parameters = mu and sigma).
I've tried to assess the quality of these parameters estimates by
simulation. For that, I've chosen the values of the three parameters
and I've generated a large data set (N values):

dataset <- rexp(N,alpha) + rnorm(N,mu,sigma)

After that, i've tried to estimate the values of the parameters like
in RMA:

bg.parameters(dataset, 2^12)

and I've been surprised to discover that the estimates weren't good at all
For example, with alpha=0.0005, mu=500, lambda=75 and N=50000, I've obtained
the following estimates:

alpha = 0.0029
mu = 776.64
sigma = 251.14

This encouraged me to find another way to estimate the parameters. I've
it, but I need your opinion:

First, I've computed the theoretical distribution of O (convolution of S and
I've found the following expression:

	f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu
- alpha*sigma^2)/(sigma))

where phi is the cumulative distribution function of the normal

Then I've computed the corresponding log-likelihood function. The
computation of maximum
log-likelihood is possible but difficult. That's why I propose to solve it
in a different way.

We have:	E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.)
and		Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 =
(appr.) var(intensities)

where we suppose that S and N are independant and where intensities is the
vector of intensities (perfect match).

Then I suggest to make the following substitutions in the log-likelihood

		mu ----> mean(intensities) - 1/alpha
		sigma^2 ---->  var(intensities) - 1/alpha^2

At this point, we have to maximize the log-likelihood which contains just
one unknown parameter : alpha
For the moment, I suggest to find the maximum empirically (compute the
function for a large range of alpha)
In the example, I've computed the function for 5000 values of alpha randomly
selected between 0.000005 and 0.05.
I've obtained the following estimates:

alpha = 0.000498
mu = 501.09
sigma = 113.84

It seems much better than the previous estimates.

I'm impatiently waiting for your comments and criticals.

Thank you in advance!


More information about the Bioconductor mailing list