[R] EM algorithm in R

Sun Mar 21 20:09:11 CET 2010

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of tj
> Sent: Saturday, March 20, 2010 10:35 AM
> To: r-help at r-project.org
> Subject: Re: [R] EM algorithm in R
> 
> 
> here is my program... Im trying to fit 1 component to 6 components in each
> of
> the 300 generated samples. Each sample has size=200. For each of the 300
> generated samples and for each modeled component (v=1,2,3,4,5,6), I will
> get
> estimates of the parameters that will maximize the likelihood, AND then I
> will determine when AIC obtained its minimum value - - is it in the 1
> component or 2 components or ...... 6 components? Then I will count the
> times that AIC has its minimum at component=2.  I will do the same for BIC
> and AIC.
> 
> MY program right now:
> 
> h=55555
> tol=0.0001
> size=200
> B=300
> mean1=-0.8
> mean2=0.8
> q=c(0.25,0.75,0,0,0,0)             #i will use this later for the qth
> quantile
> mu=0
> a1=0; a2=0; a3=0; a4=0; a5=0; a6=0
> mu1s=0 ; mu2s=0 ; mu3s=0 ; mu4s=0 ; mu5s=0 ; mu6s=0
> as1=rep(NA,B) ; as2=rep(NA,B); as3=rep(NA,B); as4=rep(NA,B);
> as5=rep(NA,B);
> as6=rep(NA,B)
> vs=0
> itrs=0
> AIC=0
> BIC=0
> CAIC=0
> Aicno=0
> Bicno=0
> Caicno=0
> 
> mylog=function(y,mu1,mu2,mu3,mu4,mu5,mu6,var,a1,a2,a3,a4,a5,a6){
> sum(log(
> (a1/sqrt(2*pi*var))*(exp(-((y-mu1)^2)/(2*var))) +
> (a2/sqrt(2*pi*var))*(exp(-((y-mu2)^2)/(2*var))) +
> (a3/sqrt(2*pi*var))*(exp(-((y-mu3)^2)/(2*var))) +
> (a4/sqrt(2*pi*var))*(exp(-((y-mu4)^2)/(2*var))) +
> (a5/sqrt(2*pi*var))*(exp(-((y-mu5)^2)/(2*var))) +
> (a6/sqrt(2*pi*var))*(exp(-((y-mu6)^2)/(2*var)))
> ),na.rm=TRUE)
> }
> 
> set.seed(h)
> 
> for (j in 1:B){                            #generating my sample
> 
> t=rbinom(1,200,0.5)
> 
> y1=rnorm(mean=mean1,sd=1,n=t)
> 
> y2=rnorm(mean=mean2,sd=1,n=200-t)
> 
> y=c(y1,y2)
> 
> var=var(y)
> mu1=as.numeric(quantile(y,q[1]))   # setting my starting values for the
> means
> mu2=as.numeric(quantile(y,q[2]))
> mu3=as.numeric(quantile(y,q[3]))
> mu4=as.numeric(quantile(y,q[4]))
> mu5=as.numeric(quantile(y,q[5]))
> mu6=as.numeric(quantile(y,q[6]))
> 
> for (v in 1:6)
> {
> 
> for (i in 2:600){                            #my maximum number of
> iteration
> to get the maximum likelihood estimates for my parameters is 600
> 
> a1[v]=ifelse(1>v,NA,1/v)                #here, i set my starting values
> for
> proportions as 1/v, where v is the  number of components. Example, When
> the
> number
> a2[v]=ifelse(2>v,NA,1/v)                 #of components is 2, the program
> will only give two starting values for proportions. I'm having a problem
> when the
> a3[v]=ifelse(3>v,NA,1/v)                 #program gives NA... there are
> problems encountered in the preceding procedures.
> a4[v]=ifelse(4>v,NA,1/v)
> a5[v]=ifelse(5>v,NA,1/v)
> a6[v]=ifelse(6>v,NA,1/v)
> 
> ms=
> c(a1[i-1]*dnorm(y,mu1[i-1],sqrt(var[i-1])),a2[i-1]*dnorm(y,mu2[i-
> 1],sqrt(var[i-1])),
>     a3[i-1]*dnorm(y,mu3[i-1],sqrt(var[i-1])),
> a4[i-1]*dnorm(y,mu4[i-1],sqrt(var[i-1])),
>     a5[i-1]*dnorm(y,mu5[i-1],sqrt(var[i-1])),
> a6[i-1]*dnorm(y,mu6[i-1],sqrt(var[i-1])))
> m=as.numeric(na.omit(ms))
> 
> B = m/sum(m)
> B1 = B[1:size]
> B2 = B[size+1:size*2]
> B3 = B[size*2+1:size*3]
> B4 = B[size*3+1:size*4]
> B5 = B[size*4+1:size*5]
> B6 = B[size*5+1:size*6]
> 
> mu1[i]=sum(B1*y)/sum(B1)
> mu2[i]=sum(B2*y)/sum(B2)
> mu3[i]=sum(B3*y)/sum(B3)
> mu4[i]=sum(B4*y)/sum(B4)
> mu5[i]=sum(B5*y)/sum(B5)
> mu6[i]=sum(B6*y)/sum(B6)
> 
> a1[i]=sum(B1)/size
> a2[i]=sum(B2)/size
> a3[i]=sum(B3)/size
> a4[i]=sum(B4)/size
> a5[i]=sum(B5)/size
> a6[i]=sum(B6)/size
> 
> var[i]=sum((B1*(y-mu1[i])^2+ B2*(y-mu2[i])^2 + B3*(y-mu3[i])^2+
>      B4*(y-mu4[i])^2 + B5*(y-mu5[i])^2 + B6*(y-mu6[i])^2),na.rm=TRUE)/size
> 
> if(abs(mylog(y,mu1[i],mu2[i],mu3[i],mu4[i],mu5[i],mu6[i],var[i],
>        a1[i],a2[i],a3[i],a4[i],a5[i],a6[i])-
> 
> mylog(y,mu1[i-1],mu2[i-1],mu3[i-1],mu4[i-1],mu5[i-1],mu6[i-1],var[i-1],
>        a1[i-1],a2[i-1],a3[i-1],a4[i-1],a5[i-1],a6[i-1]))
> <=tol)
> break()
> 
> }
> f[j]=mylog(y,mu1s[j],mu2s[j],mu3s[j],mu4s[j],mu5s[j],mu6s[j],vs[j],
>        as1[j],as2[j],as3[j],as4[j],as5[j],as6[j])
> AIC[v]=f[j]-v
> BIC[v]=f[j]-(v/2)*log(size)
> CAIC[v]=f[j]-(v/2)*log(size)-(v/2)
> if (AIC[v]< AIC[v+1])  # i need to record the value of v with minimum AIC
> if (BIC[v]< BIC[v+1])  # i need to record the value of v with minimum AIC
> if (CAIC[v]< CAIC[v+1]) #i need to record the value of v with minimum AIC
> 
> }
> 
> Aicno[j]=v    #storage for the value of v when minimum AIC was obtained
> Bicno[j]=v    #storage for the value of v when minimum AIC was obtained
> Caicno[j]=v   #storage for the value of v when minimum AIC was obtained
> as1[j]=a1[i]
> as2[j]=a2[i]
> as3[j]=a3[i]
> as4[j]=a4[i]
> as5[j]=a5[i]
> as6[j]=a6[i]
> mu1s[j]=mu1[i]
> mu2s[j]=mu2[i]
> mu3s[j]=mu3[i]
> mu4s[j]=mu4[i]
> mu5s[j]=mu5[i]
> mu6s[j]=mu6[i]
> vs[j]=var[i]
> 
> }
> 
> Aicno        # so for this, I wish to obtain 300 values of k, where k can
> be
> equal to 1,2,3,4,5,6, corresponding to the component with lowest AIC
> Bicno        # so for this, I wish to obtain 300 values of k, where k can
> be
> equal to 1,2,3,4,5,6, corresponding to the component with lowest BIC
> Caicno     # so for this, I wish to obtain 300 values of k, where k can be
> equal to 1,2,3,4,5,6, corresponding to the component with lowest CAIC
> 

This is helpful, because we can now help you with R and not your homework.  I haven't followed your whole program, but looking at 

> B = m/sum(m)
> B1 = B[1:size]
> B2 = B[size+1:size*2]
> B3 = B[size*2+1:size*3]
> B4 = B[size*3+1:size*4]
> B5 = B[size*4+1:size*5]
> B6 = B[size*5+1:size*6]

You need to be care about precedence of operators.  Type the following after making sure size has a value.  

size+1:size*2

I don't think it is doing what you intended.  A judicious use of parentheses will help.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA