[R] Improvement: function cut

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Sat Sep 18 00:29:32 CEST 2021


I disagree, I don't really think it's too long or ugly, but if you think it
is, you could abbreviate it as 'i'.


x <- 0:20
breaks1 <- seq.int(0, 16, 4)
breaks2 <- seq.int(0, 20, 4)
data.frame(
    cut(x, breaks1, right = FALSE, i = TRUE),
    cut(x, breaks2, right = FALSE, i = TRUE),
    check.names = FALSE
)


I hope this helps.

On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <leo.mada using syonic.eu> wrote:

> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one expects to
> have also the ends of the interval: the argument "include.lowest" is both
> ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have run
> through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there would be
> any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>
> While it is not explicitly mentioned anywhere in the documentation for
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep the
> definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int(0, 16, 4)
> breaks2 <- seq.int(0, 20, 4)
> cbind(
>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get
> inconsistent behaviour. While this probably wouldn't be an issue with
> 'real' data, this would seem like something you'd want to avoid by default.
> The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might be
> able to get R-core to add argument 'warn', but probably not to change the
> default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.mada using syonic.eu> wrote:
>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not included and run
>> that test.
>>
>>
>> Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>
>> Regarding your first point, argument 'include.lowest' already handles
>> this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both
>> 'cut.default' and '.bincode' return NA if a value isn't within a bin, you
>> could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help <r-help using r-project.org>
>> wrote:
>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for function cut (and
>>> .bincode):
>>>
>>>
>>> 1.) Argument: Include extremes
>>> extremes = TRUE
>>> if(right == FALSE) {
>>>     # include also right for last interval;
>>> } else {
>>>     # include also left for first interval;
>>> }
>>>
>>>
>>> 2.) Argument: warn = TRUE
>>>
>>> Warn if any values are not included in the intervals.
>>>
>>>
>>> Motivation:
>>> - reduce risk of errors when using function cut();
>>>
>>>
>>> Sincerely,
>>>
>>>
>>> Leonard
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list