[R] working with summarized data

Greg Snow Greg.Snow at intermountainmail.org
Wed Aug 30 18:28:18 CEST 2006


There are functions to do weighted summary statistics in the Hmisc
package (wtd.quantile, ...).

For more complicated analyses (but not plots yet) the biglm package has
a bigglm function that expects the data in chunks, you could write a
function that expand parts of the dataset at a time.

Hope this helps, 


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rick Bischoff
Sent: Wednesday, August 30, 2006 8:28 AM
To: r-help at stat.math.ethz.ch
Subject: [R] working with summarized data

The data sets I am working with all have a weight variable--e.g., each
row doesn't mean 1 observation.

With that in mind, nearly all of the graphs and summary statistics are
incorrect for my data, because they don't take into account the weight.

****
For example "median" is incorrect, as the quantiles aren't calculated
with weights:

sum( weights[X < median(X)] ) / sum(weights)

This should be 0.5... of course it's not.
****

Unfortunately, it seems that most(all?) of R's graphics and summary  
statistic functions don't take a weight or frequency argument.    
(Fortunately the models do...)

Am I completely missing how to do this?  One way would be to replicate
each row proportional to the weight (e.g. if the weight was 4, we would
3 additional copies) but this will get prohibitive pretty quickly as the
dataset grows.


Thanks in advance!

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list