[R] working with summarized data
Greg.Snow at intermountainmail.org
Wed Aug 30 18:28:18 CEST 2006
There are functions to do weighted summary statistics in the Hmisc
package (wtd.quantile, ...).
For more complicated analyses (but not plots yet) the biglm package has
a bigglm function that expects the data in chunks, you could write a
function that expand parts of the dataset at a time.
Hope this helps,
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
greg.snow at intermountainmail.org
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rick Bischoff
Sent: Wednesday, August 30, 2006 8:28 AM
To: r-help at stat.math.ethz.ch
Subject: [R] working with summarized data
The data sets I am working with all have a weight variable--e.g., each
row doesn't mean 1 observation.
With that in mind, nearly all of the graphs and summary statistics are
incorrect for my data, because they don't take into account the weight.
For example "median" is incorrect, as the quantiles aren't calculated
sum( weights[X < median(X)] ) / sum(weights)
This should be 0.5... of course it's not.
Unfortunately, it seems that most(all?) of R's graphics and summary
statistic functions don't take a weight or frequency argument.
(Fortunately the models do...)
Am I completely missing how to do this? One way would be to replicate
each row proportional to the weight (e.g. if the weight was 4, we would
3 additional copies) but this will get prohibitive pretty quickly as the
Thanks in advance!
R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help