[R] Add a column to a data frame with value based on the percentile of the row

Rui Barradas ruipbarradas at sapo.pt
Wed Jul 31 13:22:36 CEST 2013


Hello,

Combine quantile() with findInterval(). Something like the following.


# sample data
x <- rnorm(100)

val <- c("Bottom 50", "20 to 50", "5 to 20", "Top 5%")
qq <- quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1))

idx <- findInterval(x, qq)
val[idx]


Hope this helps,

Rui Barradas

Em 31-07-2013 10:37, Dark escreveu:
> Hi all,
>
> I think this should be an easy question for the guru's out here.
>
> I have this large data frame (2.500.000 rows, 15 columns) and I want to add
> a column named "SEGMENT" to it.
> The first 5% rows (first 125.000 rows) should have the value "Top 5%" in the
> SEGMENT column
> Then the rows from 5% to 20% should have the value "5 to 20"
> Then 20-50% should have the value "20 to 50"
> And the last 50% of the rows should have the value "Bottom 50"
>
> What is the easiest way of doing this? I was thinking of using quantile but
> then I should have some rownumber column.
>
> Regards Derk
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list