[R] what is the effective method to apply the below logic for ~1.2 million records in R

Ravi Teja raviteja2504 at gmail.com
Sat Sep 19 23:09:45 CEST 2015


Hi,

I am trying to apply the below logic to generate flag_1 column on a data
set consisting of ~1.2 million records in R.

Code :

for(i in 1: nrows)
  {
              if(A$customer[i]==A$customer[i+1])
                {

                  if(is.na(A$Time_Diff[i]))
                     A$flag_1[i] <- 1
                     else if (A$Time_Diff[i] > 12)
                     A$flag_1[i] <- 1
                     else
                     A$flag_1[i] <- A$flag_1[i-1]+1

               }

            else
            {

              if(is.na(A$Time_Diff[i]))
                     A$flag_1[i] <- 1
                     else if (A$Time_Diff[i] > 12)
                     A$flag_1[i] <- 1
                     else
                     A$flag_1[i] <- A$flag_1[i-1]+1

               }
}


Resultant dataset should look like

Customer   Time_diff    flag_1
1                   NA           1
1                   10             2
1                    8              3
1                    15            1
1                    9               2
1                    10              3
2                     NA            1
2                      2               2
2                      5               3

The above logic will take approximately 60 hours to generate the flag_1
column on a dataset consisting of ~1.2 million records. Is there any
effective way in R to implement this logic in R ?

Appreciate your help.

Thanks,
Ravi

	[[alternative HTML version deleted]]



More information about the R-help mailing list