[R] ragged data.frame? using plyr

Justin Haynes jtor14 at gmail.com
Fri Jun 3 03:03:11 CEST 2011


I have a dataset that looks like:


set.seed(144)
sam<-sample(1000,100)
dat<-data.frame(id=letters[1:10],value=rnorm(1000),day=c(rep(1,100),rep(2,100),rep(3,100),rep(4,100),rep(5,100)))

I want to "normalise" it using the following function (unless you have
a better idea...):

adj.values<-function(dframe){
  value_mean<-mean(dframe$value)
  value_sd<-sd(dframe$value)
  norm_value<-(dframe$value-value_mean)/value_sd
  score_scale<-100
  score_offset<-1000
  scaled_value<-norm_value*score_scale+score_offset
  names(scaled_value)<-dframe$id
  return(scaled_value)
}

score_out<-ddply(dat,.(day),adj.values)

Gives me my data.frame all nice and pretty and ready to do the following:

score_out.melt<-melt(score_out,id='day')
names(score_out.melt)<-c('day','id','score')

tblscore_mean<-tapply(score_out.melt$score,INDEX=score_out.melt$id,mean)
tblscore_iqr<-tapply(score_out.melt$score,INDEX=score_out.melt$id,IQR)

score_mean_iqr<-data.frame(id=names(tblscore_iqr),mean=tblscore_mean,iqr=tblscore_iqr)

However, as it turns out, my data look more like:

dat<-dat[-sam]

ldply(dlply(dat,.(id,day),adj.values),length)

So on different days I only have data for some of the id variables
which leads to a "ragged" data.frame.

ddply(dat,.(id,day),adj.values)

can i do something like

ldply(dlply(dat,.(id.day),adj.values), function(x){put in a NA for the
places where data is missing?})


To give you a sense of where this is going, I'm eventually going to
plot the mean of each id variable over the time period vs. its IQR
(again unless you have a better idea...).


As always,

thanks for your help!

Justin



More information about the R-help mailing list