[R] Add a column to a data frame with value based on the percentile of the row

Rui Barradas ruipbarradas at sapo.pt
Wed Jul 31 18:39:55 CEST 2013


Hello,

Sorry, that should be 0.80, not 0.70.

qq <- quantile(x, probs = c(0, 0.50, 0.80, 0.95, 1))

Rui Barradas


Em 31-07-2013 12:22, Rui Barradas escreveu:
> Hello,
>
> Combine quantile() with findInterval(). Something like the following.
>
>
> # sample data
> x <- rnorm(100)
>
> val <- c("Bottom 50", "20 to 50", "5 to 20", "Top 5%")
> qq <- quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1))
>
> idx <- findInterval(x, qq)
> val[idx]
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 31-07-2013 10:37, Dark escreveu:
>> Hi all,
>>
>> I think this should be an easy question for the guru's out here.
>>
>> I have this large data frame (2.500.000 rows, 15 columns) and I want
>> to add
>> a column named "SEGMENT" to it.
>> The first 5% rows (first 125.000 rows) should have the value "Top 5%"
>> in the
>> SEGMENT column
>> Then the rows from 5% to 20% should have the value "5 to 20"
>> Then 20-50% should have the value "20 to 50"
>> And the last 50% of the rows should have the value "Bottom 50"
>>
>> What is the easiest way of doing this? I was thinking of using
>> quantile but
>> then I should have some rownumber column.
>>
>> Regards Derk
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
>>
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list