[R] using ddply but preserving some of the outside data

Jarrett Byrnes byrnes at msi.ucsb.edu
Wed Aug 5 21:00:40 CEST 2009


I have a bit of a quandy.  I'm working with a data set for which I  
have sampled sites at a variety of dates.  I want to use this data,  
and get a running average of the sampled values for the current and  
previous date.

I originally thought something like ddply would be ideal for this,  
however, I cannot break up my data by date, and then apply a function  
that requires information about the previous dates.

I had thought to use a for loop and merge, but that doesn't quite seem  
to be working.

So, my questions are twofold

1) Is there a way to use something like the plyr library to do this  
efficiently
	1a) Indeed, is there a way to use ddply or its ilk to have a function  
that returns a vector of values, and then assign the variables you are  
sorting by to the whole vector?  Or maybe making each value it's own  
column in the new data frame, and then using reshape is the answer.   
Hrm.  Seems clunky.

2) Or, can a for loop around a plyr-kind of statement do the trick  
(and if so, pointers on why the below code won't work) (also, it, too,  
seems clunkier than I would like)


sites<-c("a", "b", "c")
dates<-1:5

a.df<-expand.grid(sites=sites, dates=dates)
a.df$value<-runif(15,0,100)
a.df<-as.data.frame(a.df)


#now, I want to get the average of the
mean2<-function(df, date){
	sub.df<-subset(df, df$dates-date<1 &
				df$dates-date>-1 )
	return(mean(df$value))
	}

my.df<-data.frame(sites=NA, dates=NA, V1=NA)
for(a.date in a.df$dates){
	new.df<-ddply(a.df, "sites", function(df) mean2 (df, a.date))
	my.df<-merge(my.df, new.df) #doesn't seem to work
}

my.df




More information about the R-help mailing list