[R] quantile function

Thomas Lumley tlumley at u.washington.edu
Fri Feb 6 17:04:39 CET 2004


On Fri, 6 Feb 2004, Giovanni Petris wrote:

>
> I am trying to `cut' a continuous variable into contiguous classes
> containing approximately an equal number of observations. I thought
> quantile() was the appropriate function to use in order to find the
> breakpoints, but I end up with classes of different sizes - see
> example below. Does anybody have an explanation for that? And what is
> the `recommended' way of computing what I am looking for?

Your variable is actually quite discrete, which is causing the problem.
For example, you have two 35s, so the lower groups could only be equal if one
35 was in one group and the other in the other group.

Now, if you want the groups to be equal even at the cost of not depending
just on the value there are at least two possible approaches
 - break ties randomly, for example by jitter()ing the data first
 - order the data by age and then take the first 8, next 8, and so on.

	-thomas


> Example:
>
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50
> 53 57 46  52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35
> 53 59 57 37 55 32  60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42
> 38 58 35 43 39 59 39 43 42  60 40 44

> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age, seq(0,1,length=11)[-1]))))
>
> (-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]   (52,55]
>         9         7        10         8         5        10         7         7
>   (55,59]   (59,63]
>        10         5
>
> Thanks in advance,
> Giovanni
>
> --
>
>  __________________________________________________
> [                                                  ]
> [ Giovanni Petris                 GPetris at uark.edu ]
> [ Department of Mathematical Sciences              ]
> [ University of Arkansas - Fayetteville, AR 72701  ]
> [ Ph: (479) 575-6324, 575-8630 (fax)               ]
> [ http://definetti.uark.edu/~gpetris/              ]
> [__________________________________________________]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list