[R] Odp: Data frame modification

Petr PIKAL petr.pikal at precheza.cz
Wed Jul 28 15:45:37 CEST 2010


Hi

why do you insist on loops. R is not C. If you want to use loops use C or 
similar programming languages. It is almost always better to apply whole 
object approach. Kind and clever people already programmed it (sometimes 
in C ).

x<-rnorm(20)
x[c(10,12,13,17)]<-NA

x
 [1] -1.12423790  0.80641765 -1.02686262  0.71894420 -0.76157153 
-0.09612362
 [7]  0.36681907  0.11164870 -1.06308689          NA -1.32903523 NA
[13]          NA  0.43308928 -0.16599726 -1.85594816          NA 
0.02117957
[19] -0.58170838  1.45417843

library(zoo)

na.locf(x)
 [1] -1.12423790  0.80641765 -1.02686262  0.71894420 -0.76157153 
-0.09612362
 [7]  0.36681907  0.11164870 -1.06308689 -1.06308689 -1.32903523 
-1.32903523
[13] -1.32903523  0.43308928 -0.16599726 -1.85594816 -1.85594816 
0.02117957
[19] -0.58170838  1.45417843

Would be always quicker then for cycle with condition checked in each 
step.

There was an article in R News and P.Burns R inferno is also worth to look 
at if you are interested in loop performance.

If you want to see where the time is spent use Rprof

Regards
Petr
 

siddharth.garg85 at gmail.com napsal dne 28.07.2010 15:20:11:

> Thanks for the reply Petr. I have solved this problem using sapply but 
what I 
> am trying to understand here is, why this code is slow. 
> 
> One of the possible reasons could be when I use the assignment operator 
ie
>    D$x[i]=D$x[i-1]
> It actually makes a new copy of D$x with the modified value.
> 
> Another reason could be indexed lookups might not be very fast in R.
> 
> Regards
> Siddharth
> 
> 
> 
> ------Original Message------
> From: Petr PIKAL
> To: siddharth.garg85 at gmail.com
> Cc: r-help at r-project.org
> Subject: Odp: [R] Data frame modification
> Sent: Jul 28, 2010 6:15 PM
> 
> Hi
> 
> r-help-bounces at r-project.org napsal dne 28.07.2010 11:30:48:
> 
> > Hi
> > 
> > I am trying to modify a data frame D with lists x and y in such a way 
> that if 
> > a value in x==0 then it should replace that value with the last not 
zero 
> valuein x. I.e.
> > 
> > for loop over i{
> > if(D$x[i]==0)
> >      D$x[i]=D$x[i-1]
> > }
> > 
> > The data frame is quite large in size ~ 43000 rows. This operation is 
> taking a
> > large amount of time. Can someone please suggest me what might be the 
> reason.
> 
> Bad programming practice? I would suggest to use zoo package and na.locf 

> function after changing all zero values to NA.
> 
> Regards
> Petr
> 
> > 
> > Thanks
> > Regards
> > Siddharth
> > Sent on my BlackBerry® from Vodafone
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> Sent on my BlackBerry® from Vodafone


More information about the R-help mailing list