[R] spending most of my time in assignments?

Duncan Murdoch murdoch.duncan at gmail.com
Fri Dec 20 02:37:43 CET 2013


On 13-12-19 6:37 PM, Ross Boylan wrote:
> My code seems to be spending most of its time in assignment statements,
> in some cases simple assignment of a model frame or model matrix.
>
> Can anyone provide any insights into what's going on, or how to speed
> things up?

You are seeing a lot of time being spent on complex assignments.  For 
example, line 158 is

data(sims.c1[[k]]) <- sp

That makes a function call to `data<-` to do the assignment, and that 
could be slow.  Since it's an S4 method there's a bunch of machinery 
involved in dispatching it; most of that would not have line number 
information, so it'll be charged to that line.

I can't really suggest how to speed it up.

Duncan Murdoch

>
> For starters, is it possible that the reports are not accurate, or that
> I am misreading them.  In R 3.0.1 (running under ESS):
>   > Rprof(line.profiling=TRUE)
>   > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
>      user  system elapsed
>    21.629   0.756  22.469
> !> Rprof(NULL)
>   > summaryRprof(lines="both")
>   $by.self
>                              self.time self.pct total.time total.pct
>   box.R#158                       6.74    29.56      13.06     57.28
>   simulator.multinomial.R#64      2.92    12.81       2.96     12.98
>   simulator.multinomial.R#63      2.76    12.11       2.76     12.11
>   box.R#171                       2.54    11.14       5.08     22.28
>   simulator.d1.R#70               0.98     4.30       0.98      4.30
>   simulator.d1.R#71               0.98     4.30       0.98      4.30
>   densMap.R#42                    0.72     3.16       0.86      3.77
>   "standardGeneric"               0.52     2.28      11.30     49.56
> ......
>
> Here's some of the code, with comments at the line numbers
> box.R:
>                  sp <- merge(sexpartner, data, by="studyidx")
>                  sp$y <- numFactor(sp$pEthnic)  #I think y is not used but must be present
>                  data(sims.c1[[k]]) <- sp    ###<<<<< line 158
>                  sp0 <- sp
>                  sp <- sim(sims.c1[[k]], i)
>                  ctable[[k]] <- update.c1(ctable[[k]], sp)
>                  if (is.null(i.c1.in)) {
>                      i.c1.in <- match("pEthnic", colnames(sp0))
>                      i.c1.out <- match(c("studyidx", "n", "pEthnic"), colnames(sp))
>                  }
>                  sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx", "n"))
>                  # d1
>                  sp0 <- sp0[sp0$pIsMale == 1,]
>                  # avoid lots of conversion warnings
>                  sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels)
>                  data(sims.d1[[k]]) <- sp0    ###<<<<< line 171
>                  sp <- sim(sims.d1[[k]], i)
>                  dtable[[k]] <- update.d1(dtable[[k]], sp)
>                  rngstate[[k]] <- .Random.seed
> The timing seems odd since it doesn't appear there's anything to do at
> the 2 lines except invoke data<-, but if that's slow I would expect the
> time to go to the data<- function (in a different file) and not to the
> call.
>
> In fact the other big time items are inside the data<- functions.
> simulator.multinomial.R:
>
>     setMethod("data<-", c("simulator.multinomial", "data.frame"),
>            function(obj, value) {
>      mf <- model.frame(obj at dataFormula, data=value)
>      mf$iCluster <- fromOrig(obj at idmap, as.character(mf$studyidx))
>      if (any(is.na(mf$iCluster)))
>          stop("New studyidx--need to draw from meta distn")
>      mm <- model.matrix(obj at modelFormula, data=mf)
>      obj at data <- mf  ##<<< line 63
>      obj at mm <- mm    ##<<< line 64
>      return(obj)
> })
>
> The mm and data slots have type restrictions, but no other validation
> tests.
> setClass("simulator.multinomial",
>           representation(fit="stanfit", idmap="sIDMap",
>                          modelFormula="formula",
>                          categories="ANY",  # could be factor or character
>                                          # categories should be in the order of their numeric codes in y
>                          # cached results
>                          coef="list",
>                          data="data.frame",
>                          dataFormula="formula",
>                          mm="matrix"))
> Does it matter that, e.g., a model frame is more than a vanilla data frame?
>
> I thought assignment, given R's lazy copying behavior, was essentially
> resetting a pointer, and so should be fast.
>
> Or maybe the time is going to garbage collecting the previous contents
> of the slots?
>
> Ross Boylan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list