[BioC] Estimates of RMA parameters

Rafael A. Irizarry ririzarr at jhsph.edu
Thu Aug 7 16:09:36 MEST 2003

it shoudlnt surprise you. the estimation method is very very ad-hoc. the
goal is to
obtain a useful transformation for background adjustment not estimate
parameters efficiently. you could get better estiamtes usng, for exaple,
an EM algorithm, but then RMA would be much much slower and bottom line
results wouldnt change much. furthermore, the
normal + exponential model can be improved. see
http://biosun01.biostat.jhsph.edu/~ririzarr/papers/gcpaper.pdf
for a model that fits beter (not perfect)

-r

On Thu, 7 Aug 2003,
Antille,Nicolas,LAUSANNE,NRC-BAS wrote:

> Hi!
>
> During background subtraction (with RMA), it is assumed that
> the the observed intensities (O) are the sum of a signal(S)
> and a noise(N) :
>
> 		O = S + N
>
> with S follows an exponential ditribution (parameter = alpha) and
> N follows a normal distribution (parameters = mu and sigma).
> I've tried to assess the quality of these parameters estimates by
> simulation. For that, I've chosen the values of the three parameters
> and I've generated a large data set (N values):
>
> dataset <- rexp(N,alpha) + rnorm(N,mu,sigma)
>
> After that, i've tried to estimate the values of the parameters like
> in RMA:
>
> bg.parameters(dataset, 2^12)
>
> and I've been surprised to discover that the estimates weren't good at all
> For example, with alpha=0.0005, mu=500, lambda=75 and N=50000, I've obtained
> the following estimates:
>
> alpha = 0.0029
> mu = 776.64
> sigma = 251.14
>
> This encouraged me to find another way to estimate the parameters. I've
> found
> it, but I need your opinion:
>
> First, I've computed the theoretical distribution of O (convolution of S and
> N).
> I've found the following expression:
>
> 	f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu
> - alpha*sigma^2)/(sigma))
>
> where phi is the cumulative distribution function of the normal
> distribution.
>
> Then I've computed the corresponding log-likelihood function. The
> computation of maximum
> log-likelihood is possible but difficult. That's why I propose to solve it
> in a different way.
>
> We have:	E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.)
> mean(intensities)
> and		Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 =
> (appr.) var(intensities)
>
> where we suppose that S and N are independant and where intensities is the
> vector of intensities (perfect match).
>
> Then I suggest to make the following substitutions in the log-likelihood
> function:
>
> 		mu ----> mean(intensities) - 1/alpha
> 		sigma^2 ---->  var(intensities) - 1/alpha^2
>
> At this point, we have to maximize the log-likelihood which contains just
> one unknown parameter : alpha
> For the moment, I suggest to find the maximum empirically (compute the
> function for a large range of alpha)
> In the example, I've computed the function for 5000 values of alpha randomly
> selected between 0.000005 and 0.05.
> I've obtained the following estimates:
>
> alpha = 0.000498
> mu = 501.09
> sigma = 113.84
>
> It seems much better than the previous estimates.
>
>