[R] Ranking within factor subgroups

maneesh deshpande dmaneesh at hotmail.com
Wed Feb 22 03:44:47 CET 2006


Hi,

I have a dataframe, x of the following form:

Date            Symbol   A    B  C
20041201     ABC      10  12 15
20041201     DEF       9    5   4
...
20050101     ABC         5  3   1
20050101     GHM       12 4    2
....

here A, B,C are properties of a set symbols recorded for a given date.
I wante to decile the symbols For each date and property and
create another set of columns "bucketA","bucketB", "bucketC" containing the 
decile rank
for each symbol. The following non-vectorized code does what I want,

bucket <- function(data,nBuckets) {
     q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
     q[1] <- q[1] - 0.1 # need to do this to ensure there are no extra NAs
     cut(data,q,include.lowest=T,labels=F)
}

calcDeciles <- function(x,colNames) {
nBuckets <- 10
dates <- unique(x$Date)
for ( date in dates) {
  iVec <- x$Date == date
  xx <- x[iVec,]
  for (colName in colNames) {
     data <- xx[,colName]
     bColName <- paste("bucket",colName,sep="")
     x[iVec,bColName] <- bucket(data,nBuckets)
  }
}
x
}

x <- calcDeciles(x,c("A","B","C"))


I was wondering if it is possible to vectorize the above function to make it 
more efficient.
I tried,
rlist <- tapply(x$A,x$Date,bucket)
but I am not sure how to assign the contents of "rlist" to their appropriate 
slots in the original
dataframe.

Thanks,

Maneesh




More information about the R-help mailing list