[R] rle on large data . . . without a for loop!

Justin Haynes jtor14 at gmail.com
Sat Jun 18 00:55:10 CEST 2011


I think need to do something like this:

dat<-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000,
replace=T,prob=c(0.7,0.05,0.25)),V1=runif(1,10,1000),V2=rnorm(1000))
rle.dat<-rle(dat$state)
temp<-1
out<-data.frame(id=1:length(rle.dat$length))
for(i in 1:length(rle.dat$length)){
	temp2<-temp+rle.dat$length[[i]]
	out$V1[i]<-mean(dat$V1[temp:temp2])
	out$V2[i]<-sum(dat$V2[temp:temp2])
	out$state[i]<-rle.dat$value[[i]]
	temp<-temp2
}

to a very large dataset.  I want to apply a few summary functions to
some variables within a data.frame for given states. to complicate
things, id like to use plyr and split on the id variable before i do
any of this...

loop.func<-function(dat){
  rle.dat<-rle(dat$state)
  temp<-1
  out<-data.frame(id=1:length(rle.dat$length))
  for(i in 1:length(rle.dat$length)){
	temp2<-temp+rle.dat$length[[i]]
	out$V1[i]<-mean(dat$V1[temp:temp2])
	out$V2[i]<-sum(dat$V2[temp:temp2])
	out$state[i]<-rle.dat$value[[i]]
	temp<-temp2
  }
  return(out)
}
out<-ddply(dat,.(id),loop.func)

mostly, i just don't understand how to use a list (especially in this
instance) in a plyr/apply statement...


Thanks,

Justin



More information about the R-help mailing list