[R] Median on Aggregated data

David Winsemius dwinsemius at comcast.net
Wed Nov 18 23:12:36 CET 2009


On Nov 18, 2009, at 4:55 PM, Satsangi, Vivek (GE Capital) wrote:

> Folks,
>
> I have the following code, that works fine on smaller data sets. For
> larger datasets, it runs out of memory and runs way too slow because  
> we
> are essentially creating large vectors with rep() and then calling
> median() on it. (I learned this approach from a post on the web).
>
> Below that, I have written the corresponding SAS code. The SAS code
> works fast because I can just tell the proc summary (by the weights
> option) that the Counts variable is a frequency.
>
> So, the question is, is there a simple way to do the same thing in  
> R? I
> have to run this on a large dataset -- for a small set it is not a
> problem.
>

Not sure and I see no reproducible dataset (that I recognize), but  
Harrell's  Hmisc:::wtd.quantile might be an alternate approach.


>
> ---------------------- Begin R code  
> ------------------------------------
> N <- 1005 * 14;
> myNorm <- data.frame(PaydexNormingCategory = numeric(N),
>    SIC = numeric(N), CatMedian = numeric(N));
>
> k=1;
> #j = 7941;  ## For testing only
> for (j in levels(SIC)){
> for (i in levels(PaydexNormingCategory)){
> myData <- dfpaydex[(Paydex==i) & (SIC==j),];
> myMedian <- with(myData, levels(Paydex)[median(rep(as.numeric(Paydex),
> Counts))]);
> myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) );
> k <- k+1;
> }
> }
>
> ---------------------- Begin SAS code
> ------------------------------------
>
> proc summary data=SASUser.PaydexNormfull nway;
>
>   class PaydexNormingCategory SIC ;
>   weight Counts;
>  var Paydex;
>
> output out=outstat (drop=_type_ _freq_)
>        median= / autoname;
> run;
>
> ---------------------- End SAS code  
> ------------------------------------
>
> Thanks for your guidance!
>
>
> Vivek Satsangi
> GE Capital
> Americas
>
> GE imagination at work
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list