[R] R dataframe and looping help

arun smartpink111 at yahoo.com
Mon Sep 2 23:01:36 CEST 2013


HI,
You may try this:

dat1<- read.table(text="
CustID TripDate Store Bread Butter Milk Eggs
1 2-Jan-12 a 2 0 2 1 
1 6-Jan-12 c 0 3 3 0 
1 9-Jan-12 a 3 3 0 0
1 31-Mar-13 a 3 0 0 0
2 31-Aug-12 a 0 3 3 0
2 24-Sep-12 a 3 3 0 0
2 25-Sep-12 b 3 0 0 0
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<- dat1[,-c(1:3)]

res<- lapply(seq_len(ncol(dat2)),function(i) {x1<-cbind(dat1[,c(1:3)],dat2[,i]);colnames(x1)[4]<- colnames(dat2)[i];x2<-x1[x1[,4]!=0,];within(x2, {daysbetweentrips<-unlist(tapply(as.Date(x2$TripDate,"%d-%b-%y"),list(x2$CustID),function(x) c(NA,as.numeric(diff(x)))));previoustripstore<-ave(x2$Store,x2$CustID,FUN=function(x) c(NA,x[-length(x)]));Nexttripstore<- ave(x2$Store,x2$CustID,FUN=function(x) c(x[-1],NA))})})


 res
#[[1]]
 # CustID  TripDate Store Bread Nexttripstore previoustripstore daysbetweentrips
#1      1  2-Jan-12     a     2             a              <NA>               NA
#3      1  9-Jan-12     a     3             a                 a                7
#4      1 31-Mar-13     a     3          <NA>                 a              447
#6      2 24-Sep-12     a     3             b              <NA>               NA
#7      2 25-Sep-12     b     3          <NA>                 a                1

#[[2]]
 # CustID  TripDate Store Butter Nexttripstore previoustripstore
#2      1  6-Jan-12     c      3             a              <NA>
#3      1  9-Jan-12     a      3          <NA>                 c
#5      2 31-Aug-12     a      3             a              <NA>
#6      2 24-Sep-12     a      3          <NA>                 a
 # daysbetweentrips
#2               NA
#3                3
#5               NA
#6               24

#[[3]]
 # CustID  TripDate Store Milk Nexttripstore previoustripstore daysbetweentrips
#1      1  2-Jan-12     a    2             c              <NA>               NA
#2      1  6-Jan-12     c    3          <NA>                 a                4
#5      2 31-Aug-12     a    3          <NA>              <NA>               NA

#[[4]]
 # CustID TripDate Store Eggs Nexttripstore previoustripstore daysbetweentrips
#1      1 2-Jan-12     a    1          <NA>              <NA>               NA



A.K.


Hi, I have a very quick question.. I have a data which has sales per 
category per trip of each customer at different store locations, like 
below..(dataset1 frome xcel attachment) CustID	TripDate	Store	Bread	Butter	Milk	Eggs
1	2-Jan-12	  a	2	0	2	1
1	6-Jan-12	  c	0	3	3	0
1	9-Jan-12	  a	3	3	0	0
1	31-Mar-13 a	3	0	0	0
2	31-Aug-12 a	0	3	3	0
2	24-Sep-12 a	3	3	0	0
2	25-Sep-12 b	3	0	0	0 Here i have shown 4 items and their sales per customer per trip at each 
store... However, my data contains around 100 columns with item names.. 
All i need to do is following: 1. Create a separate dataframe for each item. That is, create 100 
dataframs one for each item.. Within the dataframe for Butter, for 
example, will be contained columns 1-3 and Butter column, specifically 
filtered for rows where butter>0 in sales..(so rows 1,4,7 will be 
dropped from this dataframe)..Likewise for all items...(sample output 
for butter is: (dataset2) CustID	TripDate	Store	Butter
1	6-Jan-12	   c	3
1	9-Jan-12	   a	3
2	31-Aug-12  a	3
2	24-Sep-12  a	3 2. In same loop, create new derived variables within each dataframe for 
each item... like create a lag variable for TripDate, create lag 
variable for storename in next trip, storename in previous trip etc... 
and also # days between trips to each store for each customer...(an 
example for Butter dataframe with new derived variables would be...)
Dataset needs to be sorted by CustID, TripDate, Store before creating 
derived variables (dataset3)Book1.xlsx CustID	TripDate	Store	Butter	NextTripstore previoustripstore 
daysbetweentrips
1	6-Jan-12	   c	3	a	              -	       -
1	9-Jan-12	   a	3	-	              c	       -
2	31-Aug-12  a	3	a	              -	       -
2	24-Sep-12  a	3	-	              a	     24 Point of creating multiple item level dataframes is, i will use them 
iteratively as i will perform some regression on these datasets, using 
same set of variables each time



More information about the R-help mailing list