[R] how to make simulation faster

jim holtman jholtman at gmail.com
Fri Oct 26 19:03:14 CEST 2012

The code you posted was not runnable.  'r' and at least 'Zi' were missing.

The 'total time' is the amount of the elapsed time that it was
sampling with the given function.  The "self time" is how much time
was actually spent in that function.

>From your data, one of the big hitter is the "factor" function.  This
is probably within the "aggregate" function since this is where most
of the "total time" is being spent.  You may want to look at what you
are trying to do with the "aggregate" and see if there is another way
of doing it.

You seem to have an object "i" being used in the aggregate that does
not seem to be defined.

There are probably other ways of speeding it up.  Here is one that
compares 'aggregate' to 'sapply' to do the same thing:

> x <- as.numeric(1:1e6)
> z <- sample(100, 1e6, TRUE)
> system.time(r1 <- aggregate(x, list(z), sum))
   user  system elapsed
   3.62    0.05    3.67
> system.time(r2 <- sapply(split(x, z), sum))
   user  system elapsed
   0.56    0.00    0.56

So this is about a 6X increase in performance.

On Fri, Oct 26, 2012 at 9:08 AM, stats12 <skarmv at gmail.com> wrote:
> Hi,
> Thank you for your reply. I updated my post with the code. Also, about
> posting from Nabble, since I am a new user I didn't know about that problem.
> If I post to the mailing list ( r-help at r-project.org), would it get rid of
> that problem?
> output1<-vector("numeric",length(1:r))
> output2<-vector("numeric",length(1:r))
> output3<-vector("numeric",length(1:r))
> output4<-vector("numeric",length(1:r))
> for (m in 1:r){
> ll<-function(p){
> cumhaz<-(time*exp(Zi*p[3]))*p[1]
> cumhaz<-aggregate(cumhaz,by=list(i),FUN=sum)
> lnhaz<-delta*log(exp(Zi*p[3])*p[1])
> lnhaz<-aggregate(lnhaz,by=list(i),FUN=sum)
> lnhaz<-lnhaz[2:(r+1)]
> cumhaz<-cumhaz[2:(r+1)]
> lik<-r[m]*log(p[2])-
> sum((di[,m]+1/p[2])*log(1+cumhaz[,m]*p[2]))+sum(lnhaz[,m])-
> n*log(gamma(1/p[2]))+sum(log(gamma(di[,m]+1/p[2])))
> -lik
> }
> initial<-c(1,0.5,0.5)
> estt<-nlm(ll,initial,hessian=T)
> lambda<-estt$estimate[1]
> theta<-estt$estimate[2]
> beta<-estt$estimate[3]
> hessian<-t$hessian
> cov<-solve(hessian)
> vtheta<-cov[2,2]
> vbeta<-cov[3,3]
> output1[m]<-theta
> output2[m]<-beta
> output3[m]<-vtheta
> output4[m]<-vbeta
> }
> theta<-as.vector(output1)
> beta<-as.vector(output2)
> vtheta<-as.vector(output3)
> vbeta<-as.vector(output4)
> Code would be nice: I struggle to think of an basic MLE fitting that
> would take ~36s per iteration and scale so badly if you are writing
> idiomatic R:
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> Finally, I note you're posting from Nabble. Please include context in
> your reply -- I don't believe Nabble does this automatically, so
> you'll need to manually include it. Most of the regular respondents on
> this list don't use Nabble -- it is a _mailing list_ after all -- so
> we don't get the forum view you do, only emails of the individual
> posts. Combine that with the high volume of posts, and it's quite
> difficult to trace a discussion if we all don't make sure to include
> context.
> --
> View this message in context: http://r.789695.n4.nabble.com/how-to-make-simulation-faster-tp4647492p4647541.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

More information about the R-help mailing list