[R] generating a rank variable using date in a data.frame: overcoming a date origin error

Gavin Rudge g.rudge at bham.ac.uk
Wed Jan 29 17:16:12 CET 2014


I've got a simple data.frame of a facotr variable called 'case' which indicates one subject and a date of an event ('obs'), each row representing an observation. One case can have many (or few) observations over time in the data set.

I've created a crude data.frame by way of a clunky but reproducible example.

My objective is simply to create a variable that captures a rank of the occurrence of the events for each case in date order, 1 being the first up to n being the nth. To this end I've used the 'ave' command as below.

set.seed(66)
d<-(seq(as.Date("2001/01/01"),as.Date("2011/12/31"),"days"))
obs<-(as.Date(sample(d,200,replace=TRUE)))
obs<-as.data.frame(obs)
case<-(case=(sample(LETTERS[1:8],200,replace=TRUE)))
case<-as.data.frame(case)
df<-cbind(case,obs)
df$rank<-ave(df$obs,df$case, FUN=rank)

This throws one of those "Error in as.Date.numeric(value) : 'origin' must be supplied" errors

I get why this is happening, that I have not explicitly set the date origin when I set up the date variables, but my question is where do I do this?  I've tried variations of the above where I've used an origin="1900-01-01".in various lines in the above code but I am still getting the error.  
Also by way of a supplementary question, in my actual application I am bringing in a lot of data from .csv files which contain data originally generated by the data owner in excel, so does this mean that I need to always set the origin at 1st Jan 1900?

Any help gratefully recieved,

GavinR



More information about the R-help mailing list