[BioC] Estimates of RMA parameters

Antille,Nicolas,LAUSANNE,NRC-BAS nicolas.antille at rdls.nestle.com
Thu Aug 7 15:08:55 MEST 2003


Sorry! same message as before but with the correct values!!

Sorry for the inconvenience.

Regards

Nicolas

###############################################################
###############################################################

Hi!

During background subtraction (with RMA), it is assumed that
the the observed intensities (O) are the sum of a signal(S)
and a noise(N) :

		O = S + N

with S follows an exponential ditribution (parameter = alpha) and
N follows a normal distribution (parameters = mu and sigma).
I've tried to assess the quality of these parameters estimates by
simulation. For that, I've chosen the values of the three parameters
and I've generated a large data set (N values):

dataset <- rexp(N,alpha) + rnorm(N,mu,sigma)

After that, i've tried to estimate the values of the parameters like
in RMA:

bg.parameters(dataset, 2^12)

and I've been surprised to discover that the estimates weren't good at all
For example, with alpha=0.01, mu=50, lambda=75 and N=20000, I've obtained
the following estimates:

alpha = 0.037
mu = 95.004
sigma = 112.77

This encouraged me to find another way to estimate the parameters. I've
found
it, but I need your opinion:

First, I've computed the theoretical distribution of O (convolution of S and
N).
I've found the following expression:

	f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu
- alpha*sigma^2)/(sigma))

where phi is the cumulative distribution function of the normal
distribution.

Then I've computed the corresponding log-likelihood function. The
computation of maximum
log-likelihood is possible but difficult. That's why I propose to solve it
in a different way.

We have:	E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.)
mean(intensities)
and		Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 =
(appr.) var(intensities)

where we suppose that S and N are independant and where intensities is the
vector of intensities (perfect match).

Then I suggest to make the following substitutions in the log-likelihood
function:

		mu ----> mean(intensities) - 1/alpha
		sigma^2 ---->  var(intensities) - 1/alpha^2

At this point, we have to maximize the log-likelihood which contains just
one unknown parameter : alpha
For the moment, I suggest to find the maximum empirically (compute the
function for a large range of alpha)
In the example, I've computed the function for 1000 values of alpha randomly
selected between 0.001 and 0.1.
I've obtained the following estimates:

alpha = 0.01006
mu = 49.783
sigma = 75.411

It seems much better than the previous estimates.

I'm impatiently waiting for your comments and criticals.

Thank you in advance!

Nicolas

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list