[R] Ranking within factor subgroups

maneesh deshpande dmaneesh at hotmail.com
Thu Feb 23 04:45:37 CET 2006


Hi Adai,

I think your solution only works if the rows of the data frame are ordered 
by "date" and
the ordering function is the same used to order the levels of 
factor(df$date) ?
It turns out (as I implied in my question) my data is indeed organized in 
this manner, so my
current problem is solved.
In the general case, I suppose, one could always order the data frame by 
date before proceeding ?

Thanks,

Maneesh


>From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
>Reply-To: ramasamy at cancer.org.uk
>To: maneesh deshpande <dmaneesh at hotmail.com>
>CC: r-help at stat.math.ethz.ch
>Subject: Re: [R]  Ranking within factor subgroups
>Date: Wed, 22 Feb 2006 03:44:45 +0000
>
>It might help to give a simple reproducible example in the future. For
>example
>
>  df <- cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
>                          B=rpois(500, 50), C=rpois(500, 30) )
>
>might generate something like
>
>	    date   A  B  C
>	  1    1  93 51 32
>	  2    1  95 51 30
>	  3    1 102 59 28
>	  4    1 105 52 32
>	  5    1 105 53 26
>	  6    1  99 59 37
>	...    . ... .. ..
>	495    5 100 57 19
>	496    5  96 47 44
>	497    5 111 56 35
>	498    5 105 49 23
>	499    5 105 61 30
>	500    5  92 53 32
>
>Here is my proposed solution. Can you double check with your existing
>functions to see if they are correct.
>
>    decile.fn <- function(x, nbreaks=10){
>      br     <- quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
>      br[1]  <- -Inf
>      return( cut(x, br, labels=F) )
>    }
>
>    out <- apply( df[ ,c("A", "B", "C")], 2,
>                  function(v) unlist( tapply( v, df$date, decile.fn ) ) )
>
>    rownames(out) <- rownames(df)
>    out <- cbind(df$date, out)
>
>Regards, Adai
>
>
>
>On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
> > Hi,
> >
> > I have a dataframe, x of the following form:
> >
> > Date            Symbol   A    B  C
> > 20041201     ABC      10  12 15
> > 20041201     DEF       9    5   4
> > ...
> > 20050101     ABC         5  3   1
> > 20050101     GHM       12 4    2
> > ....
> >
> > here A, B,C are properties of a set symbols recorded for a given date.
> > I wante to decile the symbols For each date and property and
> > create another set of columns "bucketA","bucketB", "bucketC" containing 
>the
> > decile rank
> > for each symbol. The following non-vectorized code does what I want,
> >
> > bucket <- function(data,nBuckets) {
> >      q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
> >      q[1] <- q[1] - 0.1 # need to do this to ensure there are no extra 
>NAs
> >      cut(data,q,include.lowest=T,labels=F)
> > }
> >
> > calcDeciles <- function(x,colNames) {
> > nBuckets <- 10
> > dates <- unique(x$Date)
> > for ( date in dates) {
> >   iVec <- x$Date == date
> >   xx <- x[iVec,]
> >   for (colName in colNames) {
> >      data <- xx[,colName]
> >      bColName <- paste("bucket",colName,sep="")
> >      x[iVec,bColName] <- bucket(data,nBuckets)
> >   }
> > }
> > x
> > }
> >
> > x <- calcDeciles(x,c("A","B","C"))
> >
> >
> > I was wondering if it is possible to vectorize the above function to 
>make it
> > more efficient.
> > I tried,
> > rlist <- tapply(x$A,x$Date,bucket)
> > but I am not sure how to assign the contents of "rlist" to their 
>appropriate
> > slots in the original
> > dataframe.
> >
> > Thanks,
> >
> > Maneesh
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html
> >
>




More information about the R-help mailing list