[R] subset grouped data with quantile and NA's

jim holtman jholtman at gmail.com
Fri Aug 22 14:29:12 CEST 2008


This will also remove the NAs from the output;  you will have to
change it to also keep the NAs.  Wasn't sure what you wanted to do
with them.

dat <- data.frame(fac = rep(c("a", "b"), each = 100),
                 value = c(rnorm(130), rep(NA, 70)),
                 other = rnorm(200))
# split the data
x.s <- split(dat, dat$fac, drop=TRUE)
# process the quantiles
x.l <- lapply(x.s, function(.fac){
    # remove NAs from the output -- need to change if you want to keep NAs
    .fac[(!is.na(.fac$value)) & (.fac$value <= quantile(.fac$value,
prob=0.95, na.rm=TRUE)),]
})
# put back into a dataframe
dat.new <- do.call(rbind, x.l)


On Fri, Aug 22, 2008 at 3:35 AM, David Carslaw
<d.c.carslaw at its.leeds.ac.uk> wrote:
>
> I can't quite seem to solve a problem subsetting a data frame.  Here's a
> reproducible example.
>
> Given a data frame:
>
> dat <- data.frame(fac = rep(c("a", "b"), each = 100),
>                  value = c(rnorm(130), rep(NA, 70)),
>                  other = rnorm(200))
>
> What I want is a new data frame (with the same columns as dat) excluding the
> top 5% of "value" separately by "a" and "b". For example, this produces the
> results I'm after in an array:
>
> sub <- tapply(dat$value, dat$fac, function(x) x[x < quantile(x, probs =
> 0.95, na.rm = TRUE)])
>
> My difficulty is putting them into a data frame along with the other columns
> "fac" and "other". Note that quantile will return different length vectors
> due to different numbers of NAs for a and b.
>
> There's something I'm just not seeing - can you help?
>
> Many thanks.
>
> David Carslaw
>
> -----
> Institute for Transport Studies
> University of Leeds
> --
> View this message in context: http://www.nabble.com/subset-grouped-data-with-quantile-and-NA%27s-tp19102795p19102795.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list