[R] Trying to understand cut

Jim Lemon drjimlemon at gmail.com
Sun Apr 17 07:15:41 CEST 2016


Hi John,
Both the "right" and "include.lowest" arguments are usually useful
when there are values equal to those in "breaks". A value equal to a
break can fall on either side of the break depending upon these
arguments:

> nums<-1:100
> table(cut(nums,breaks=seq(0,100,by=10)))

 (0,10]  (10,20]  (20,30]  (30,40]  (40,50]  (50,60]  (60,70]  (70,80]
     10       10       10       10       10       10       10       10
(80,90] (90,100]
     10       10

because the breaks are left-closed all of the values equal to a break
at the higher end are shifted up and the 100 value is lost in this one

> table(cut(nums,breaks=seq(0,100,by=10),right=FALSE))

 [0,10)  [10,20)  [20,30)  [30,40)  [40,50)  [50,60)  [60,70)  [70,80)
      9       10       10       10       10       10       10       10
[80,90) [90,100)
     10       10

but if I include.lowest (which is really highest when right=FALSE),
the highest value in the last cut (100) is preserved.

> table(cut(nums,breaks=seq(0,100,by=10),right=FALSE,include.lowest=TRUE))

 [0,10)  [10,20)  [20,30)  [30,40)  [40,50)  [50,60)  [60,70)  [70,80)
      9       10       10       10       10       10       10       10
[80,90) [90,100]
     10       11

data.frame(A=nums,
 B=cut(nums,breaks=seq(0,100,by=10),right=FALSE,
 include.lowest=TRUE))

to see the correspondence.

Jim

On Sun, Apr 17, 2016 at 2:12 PM, John Sorkin
<jsorkin at grecc.umaryland.edu> wrote:
> Jeff,
> Perhaps I was sloppy with my notation:
> I want groups
>>=0 <10
>>=10 <20
>>=20<30
> ......
>>=90 <100
>
> In any event, my question remains, why did the four different versions of cut give me the same results? I hope someone can explain to me the function of
> include.lowest and right in the call to cut. As demonstrated in my example below, the parameters do not seem to alter the results of using cut.
> Thank you,
> John
>
>
> P.S. How do I find FAQ 7.31?
> Thank you,
> John
>
> I
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>>> Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 04/16/16 11:07 PM >>>
> Have you read FAQ 7.31 recently, John? Your whole premise is flawed. You should be thinking of ranges [0,10), [10,20), and so on because numbers ending in 0.9 are never going to be exact.
> --
> Sent from my phone. Please excuse my brevity.
>
>
> On April 16, 2016 7:38:50 PM PDT, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:
> I am trying to understand cut so I can divide a list of numbers into 10 group:
>   0-9.0
> 10-10.9
> 20-20.9
> 30-30.9,
> 40-40.9,
> 50-50.9
> 60-60.9
> 70-70.9
> 80-80.9
> 90-90.9
>
> As I try to do this, I have been playing with the cut function. Surprising the following for applications of cut give me the exact same groups. This surprises me given that I have varied parameters include.lowest and right. Can someone help me understand what include.lowest and right do? I have looked at the help page, but I don't seem to understand what I am being told!
> Thank you,
> John
>
> values <- c((0:99),c(0.9:99.9))
> sort(values)
> c1<-cut(values,10,include.lowest=FALSE,right=TRUE)
> c2<-cut(values,10,include.lowest=FALSE,right=FALSE)
> c3<-cut(values,10,include.lowest=TRUE,right=TRUE)
> c4<-cut(values,10,include.lowest=TRUE,right=FALSE)
> cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
> cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
> cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
> cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
>
> You can run the code below, or inspect the results I got which are reproduced below:
>
>  cbind(min=aggregate(values,list(c1),min),max=aggregate(values,list(c1),max))
>
>       min.Group.1 min.x    max.Group.1 max.x
> 1  (-0.0999,9.91]     0 (-0.0999,9.91]   9.9
> 2     (9.91,19.9]    10    (9.91,19.9]  19.9
> 3     (19.9,29.9]    20    (19.9,29.9]  29.9
> 4     (29.9,39.9]    30    (29.9,39.9]  39.9
> 5       (39.9,50]    40      (39.9,50]  49.9
> 6         (50,60]    50        (50,60]  59.9
> 7         (60,70]    60        (60,70]  69.9
> 8         (70,80]    70        (70,80]  79.9
> 9         (80,90]    80        (80,90]  89.9
> 10       (90,100]    90       (90,100]  99.9
>  cbind(min=aggregate(values,list(c2),min),max=aggregate(values,list(c2),max))
>
>       min.Group.1 min.x    max.Group.1 max.x
> 1  [-0.0999,9.91)     0 [-0.0999,9.91)   9.9
> 2     [9.91,19.9)    10    [9.91,19.9)  19.9
> 3     [19.9,29.9)    20    [19.9,29.9)  29.9
> 4     [29.9,39.9)    30    [29.9,39.9)  39.9
> 5       [39.9,50)    40      [39.9,50)  49.9
> 6         [50,60)    50        [50,60)  59.9
> 7         [60,70)    60        [60,70)  69.9
> 8         [70,80)    70        [70,80)  79.9
> 9         [80,90)    80        [80,90)  89.9
> 10       [90,100)    90       [90,100)  99.9
>  cbind(min=aggregate(values,list(c3),min),max=aggregate(values,list(c3),max))
>
>       min.Group.1 min.x    max.Group.1 max.x
> 1  [-0.0999,9.91]     0 [-0.0999,9.91]   9.9
> 2     (9.91,19.9]    10    (9.91,19.9]  19.9
> 3     (19.9,29.9]    20    (19.9,29.9]  29.9
> 4     (29.9,39.9]    30    (29.9,39.9]  39.9
> 5       (39.9,50]    40      (39.9,50]  49.9
> 6         (50,60]    50        (50,60]  59.9
> 7         (60,70]    60        (60,70]  69.9
> 8         (70,80]    70        (70,80]  79.9
> 9         (80,90]    80        (80,90]  89.9
> 10       (90,100]    90       (90,100]  99.9
>  cbind(min=aggregate(values,list(c4),min),max=aggregate(values,list(c4),max))
>
>       min.Group.1 min.x    max.Group.1 max.x
> 1 [-0.0999,9.91)     0 [-0.0999,9.91)   9.9
> 2     [9.91,19.9)    10    [9.91,19.9)  19.9
> 3     [19.9,29.9)    20    [19.9,29.9)  29.9
> 4     [29.9,39.9)    30    [29.9,39.9)  39.9
> 5       [39.9,50)    40      [39.9,50)  49.9
> 6         [50,60)    50        [50,60)  59.9
> 7         [60,70)    60        [60,70)  69.9
> 8         [70,80)    70        [70,80)  79.9
> 9         [80,90)    80        [80,90)  89.9
> 10       [90,100]    90       [90,100]  99.9
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
> Confidentiality Statement:
> This email message, including any attachments, isfor t...{{dropped:26}}



More information about the R-help mailing list