[R] need technique for speeding up R dataframe individual element insertion (no deletion though)

jim holtman jholtman at gmail.com
Thu Aug 13 14:25:01 CEST 2009


First of all, do the strptime conversions one time outside the loop.
I would guess that if you ran Rprof on the code, most of the time is
in that routine -- did you run Rprof?

Also you are going through the loop one too many times; your ending
value is 'length(cam$end_date)' and then you are indexing one greater
than that in the loop 'x2=strptime(cam$end_date[i+1], "%d/%m/%Y");'

FYI -- you don't need the semicolons at the end of the statements.

On Thu, Aug 13, 2009 at 8:07 AM, Ishwor<ishwor.gurung at gmail.com> wrote:
> Hi fellas,
>
> I am working on a dataframe cam and it involves comparison within the
> 2 columns - t1 and t2 on about 20K rows and 14 columns.
>
> ###
> cap = cam; # this doesn't take long. ~1 secs.
>
>
> for( i in 1:length(cam$end_date))
>  {
>    x1=strptime(cam$end_date[i], "%d/%m/%Y");
>    x2=strptime(cam$end_date[i+1], "%d/%m/%Y");
>
>    t1= cam$vol[i];
>    t2= cam$vol[i+1];
>
>    if(!is.na(x2) && !is.na(x1) && !is.na(t1) && !is.na(t2))
>    {
>      if( (x2>=x1) && (t1==t2) ) # date and vol
>      {
>        cap$levels[i]=1; #make change to specific dataframe cell
>        cap$levels[i+1]=1;
>      }
>    }
>  }
> ###
>
> Having coded that, i ran a timing profile on this section and each
> 1000'th row comparison is taking ~1.1 minutes on a 2.8Ghz dual-core
> box (which is a test box we use).
> This obviously computes to ~21 minutes for 20k which is definitely not
> where we want it headed. I believe, optimisation(or even different way
> to address indexing inside dataframe) can be had inside the innermost
> `if' and specifically in `cap$levels[i]=1;' but I am a bit at a loss
> having scoured the documentation failing to find anything of value.
> So, my question remains are there any general/specific changes I can
> do to speed up the code execution dramatically?
>
> Thanks folks.
>
> --
> Regards,
> Ishwor Gurung
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list