[R] lapply or data.table to find a unit's previous transaction

William Rogers whrogers73 at gmail.com
Thu Jun 3 04:29:56 CEST 2010


I have a dataset of property transactions that includes the
transaction ID (TranID), property ID (UnitID), and transaction date
(TranDt). I need to create a data frame (or data table) that includes
the previous transaction date, if one exists.
This is an easy problem in SQL, where I just run a sub-query, but I'm
trying to make R my one-stop-shopping program.  The following code
works on a subset of my data, but I can't run this on my full dataset
because my computer runs out of memory after about 30 minutes. (Using
a 32-bit machine.)
Use the following synthetic data for example.

n<- 100
TranID<- lapply(n:(2*n), function(x) (
as.matrix(paste(x, sample(seq(as.Date('2000-01-01'),
as.Date('2010-01-01'), "days"), sample(1:5, 1)), sep= "D"), ncol= 1)))
TranID<- do.call("rbind", TranID)
UnitID<- substr(TranID, 1, nchar(n))
TranDt<- substr(TranID, nchar(n)+2, nchar(n)+11)
Data<- data.frame(TranID= TranID, UnitID= UnitID, TranDt= as.Date(TranDt))

#First I create a list of all the previous transactions by unit

TranList<- as.matrix(Data$TranID, ncol= 1)
PreTran<- lapply(TranList,
  function(x) (with(Data,
  Data[
  UnitID== substr(x, 1, nchar(n))&
  TranDt< Data[TranID== x, "TranDt"], ]
  ))
  )

#I do get warnings about missing data because some transactions have
no predecessor.
#Some transactions have no previous transactions, others have many so
I pick the most recent

BeforeTran<- lapply(seq_along(PreTran), function(x) (
with(PreTran[[x]], PreTran[[x]][which(TranDt== max(TranDt)), ])))

#I need to add the current transaction's TranID to the list so I can merge later

BeforeTran<- lapply(seq_along(PreTran), function(x) (
transform(BeforeTran[[x]], TranID= TranList[x, 1])))

#Finally, I convert from a list to a data frame

BeforeTran<- do.call("rbind", BeforeTran)

#I have used a combination of data.table and for loops, but that seems
cheesey and doesn't preform much better.

library(data.table)

#First I create a list of all the previous transactions by unit

TranList2<- vector(nrow(Data), mode= "list")
names(TranList2)<- levels(Data$TranID)
DataDT<- data.table(Data)

#Use a for loop and data.table to find the date of the previous transaction

for (i in levels(Data$TranID)) {
if (DataDT[UnitID== substr(i, 1, nchar(n))&
   TranDt<= (DataDT[TranID== i, TranDt]),
length(TranDt)]> 1)
TranList2[[i]]<- cbind(TranID= i,
DataDT[UnitID== substr(i, 1, nchar(n))&
TranDt< (DataDT[TranID== i, TranDt]),
list(TranDt= max(TranDt))])
}

#Finally, I convert from a list to a data table

BeforeTran2<- do.call("rbind", TranList2)

#My intution says that this code doesn't take advantage of
data.table's attributes.
#Are there any ideas out there?  Thank you.
#P.S. I've tried plyr and it does not help my memory problem.

--
William H. Rogers



More information about the R-help mailing list