[R] How to optimizing the code for calculating the launch time

David McPearson dmcp at webmail.co.za
Wed Aug 6 09:59:31 CEST 2014


On Tue, 5 Aug 2014 09:51:56 +0300 Lingyi Ma <lingyi.ma at gmail.com> wrote

> My dataset:
> 
>     Item_Id    Year_Month
>  B65623262     201204
>  B58279745     201204
>  B33671102     201204
>  B36630946     201204
>  B63270151     201204
>  B63270133     201204
> 
> 
> 
>  I have written my code to calculate one more column which is the product
> maturity time as the following:
> 
> launchtime<-function(g){
> 
> su<-data.frame(NULL)    #product maturity time
> 
> g<-g[order(g$Item_Id,g$Year_Month),]
> 
> #get the lauching time
> index2<-unique(g$Item_Id)
> 
> for(u in 1:length(index2)){
> m2<-g[g$Item_Id==index2[u],]
> 
> lt<-numeric(0)
> lt[1]<-0
> year<-as.numeric(substring(m2$Year_Month,1,4))
> month<-as.numeric(substring(m2$Year_Month,5,6))
> if(dim(m2)[1]==1){}else{
>  for(i in 1:(dim(m2)[1]-1)){
> j<-i+1
> lt[j]<-(12*(year[j]-year[1])+month[j])-(month[1])
> }}
> g2<-cbind(m2,lt)
> su<-rbind(su,g2)
> }
> 
> return(su)
> 
> }
> 
> 
> 
> How to optimize my code-. it takes so long time to run.
> 
> 
> Kind regards,
> Tammy
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Please read the posting guide - and the footer to just about every message on
the list (just above): Do not post html - it makes your messages hard to read.

Sample data and an example of the expected output will also go a long way to
help us understand your problem and help you.

Having writ that, welcome to Circles 1, 2, 3, 3.1, and 8.1.30 of The R Inferno
http://www.burns-stat.com/documents/books/the-r-inferno/
Get a copy. Read it. The time it takes will pay off ten-fold in coping with
common R mistakes. (And we should all thank Dr Burns for providing the
download at the best of all possible prices...) (Don't try and understand it
all at once - take note of the bits that make sense.)

To optimise your code:
* get rid of the inner loop (I don't think you need it)
* do not grow objects if possible, especially do not grow them inside a loop
* to get the results (I think) you want, put in the parentheses you (seem to
have) missed.

My version of your code below. Hopefully the comments will be of assistance.

#====================================================================

launchtime <- function(g) {

  g <- cbind(g[order(g$Item_Id,g$Year_Month),], launch = NA]
  # Inside a function you are working with a copy of the data.
  # This will not change the original and will (should be) safe.

  #get the lauching time
  index2 <- unique(g$Item_Id)

  next_row <- 1

  # for(u in 1:length(index2)){
  #   m2<-g[g$Item_Id==index2[u],]

  # for (u in seq_along(index2)) is better, but in this case I prefer
  for (u in index2) {
    m2 <- g[g$Item_Id == index2]

    year<-as.numeric(substring(m2$Year_Month,1,4))
    month<-as.numeric(substring(m2$Year_Month,5,6))
    # hmmm - probably better to use "as.integer()" in this case. 
    # Also, if Year_Month is already numeric, the following might
    # work better (or not...)
    # year <- floor(m2$Year_Month / 100)
    # month <- m2$Year_Month %% 100

    # I don't think you need to loop...
    lt <- 12 * (year - year[1]) + month - month[1]
    # (Parentheses are mine - but I'm sure you need them)

    g$launch[next_row:(next_row + length(lt) - 1)] <- lt
    next_row <- next_row + length(lt)
    # given that we've eliminated the inner loop, you could probably
    # cbind() and rbind() OK - but this should be (at least a bit)
    # faster
  }
  g

}
#
#====================================================================
Cheers.

____________________________________________________________
South Africas premier free email service - www.webmail.co.za 

Cotlands - Shaping tomorrows Heroes http://www.cotlands.org.za/



More information about the R-help mailing list