[R] mean for every quartile

David L Carlson dcarlson at tamu.edu
Mon May 16 16:07:02 CEST 2016

```Do you understand that quartiles divide the data into 4 groups?

Min (group 1) 1st quartile (group 2) median (group3) 3rd quartile (group4) max

But in your case df\$BR has only 4 unique values:

> table(df\$BR)

256 320 384 512
2  74  24   2

So the first quartile is equal to the median:

> quantile(df\$BR)
0%  25%  50%  75% 100%
256  320  320  368  512

You need to use the argument rightmost.closed=TRUE with findInterval(). If you do not, the 5th group consists of only those values that are equal to the maximum:

> df\$quant <- findInterval(df\$BR, quantile(df\$BR), rightmost.closed=TRUE)
> tapply(df\$BR, df\$quant, mean)
1        3        4
256.0000 320.0000 393.8462

Using values that are more variable:

> set.seed(42)
> df <- data.frame(BR=sample.int(100, 100, replace=TRUE))
> df\$quant <- findInterval(df\$BR, quantile(df\$BR), rightmost.closed=TRUE)
> tapply(df\$BR, df\$quant, mean)
1     2     3     4
12.48 41.24 67.24 90.64

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of ch.elahe via R-help
Sent: Monday, May 16, 2016 8:46 AM
To: Michael Dewey; ulrik.stervbo at gmail.com
Cc: R-help Mailing List
Subject: Re: [R] mean for every quartile

By using tapply I get this result:

tapply(df\$BR, findInterval(df\$BR, quantile(df\$BR)), mean)
1   3   4   5
256 320 384 512

But I think this is not true,cause I have to get 5 means but here I get four numbers!

On Monday, May 16, 2016 6:29 AM, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
Dear Elahe

In line

On 16/05/2016 13:31, ch.elahe via R-help wrote:
> Hi all,
> I have a column in my df and I want to get quartiles for this column and then calculate mean for each and every quartile, here is my column:
>

The quartiles are strictly speaking the boundaries but if you really
meant that the problem is trivial so i assume you want to cut the
variable at the quartiles.

>
>     df\$BR
>     [1] 384 384 384 384 512 384 384 320 320 320 320 320 320 320 320 320 320 384
>     [19] 384 384 320 320 320 320 384 384 256 320 320 320 384 320 320 320 384 384
>     [37] 320 320 320 320 320 320 320 320 320 384 320 320 320 320 320 320 384 320
>     [55] 320 320 320 320 320 320 384 512 320 320 320 320 320 320 320 384 384 320
>     [73] 320 320 384 320 320 320 320 256 320 320 384 320 384 320 384 320 320 320
>     [91] 384 320 320 320 320 320 320 320 320 320 320 320
>
> I do the following to get the quartiles:
>
>
>     quantile(m\$BR)
>     0%  25%  50%  75% 100%
>     256  320  320  368  512
>
> now how can I get mean for each quartile?

How about setting up a vector which takes the values 1, 2, 3, 4
depending on the values of BR with cutpoints defined by
quantile(BR)(using ifelse) and then using tapply?

> Thnaks for any help,
> Elahe
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

--
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help