[Rd] split() - unexpected sorting of results

Peter Meissner retep.meissner at gmail.com
Sun Oct 22 14:01:48 CEST 2017


Thank you all for your input - most appreciated.

Best, Peter

Am 21.10.2017 07:35 schrieb "Rui Barradas" <ruipbarradas at sapo.pt>:

> Hello,
>
> In order to solve that problem of sorting numerics made characters there
> is package stringr, functions str_sort and str_order.
>
> library(stringr)
>
> set.seed(2447)
>
> x <- sample(11L)
> sort(as.character(x))
> [1] "1"  "10" "11" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"
>
> str_sort(as.character(x), numeric = TRUE)
> [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"
>
> str_order(as.character(x), numeric = TRUE)
> #[1]  1  4 11  8  6  5  3 10  9  7  2
>
> i <- str_order(as.character(x), numeric = TRUE)
> as.character(x)[i]
> #[1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"
>
>
> Unfortunately this does not solve the OP's question, factor(),
> as.factor(), split() and others use the base R sorter and this can only be
> changed by changing their sources.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 21-10-2017 00:32, Hervé Pagès escreveu:
>
>> Hi,
>>
>> On 10/20/2017 12:53 PM, Peter Meissner wrote:
>>
>>> Thanks, for the explanation.
>>>
>>> Still, I think this is surprising bahaviour which might be handled
>>> better.
>>>
>>
>> Maybe a little surprising, but no more than:
>>
>>  > x <- sample(11L)
>>
>>  > sort(x)
>>   [1]  1  2  3  4  5  6  7  8  9 10 11
>>
>>  > sort(as.character(x))
>>   [1] "1"  "10" "11" "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"
>>
>> The fact that sort(), as.factor(), split() and many other things behave
>> consistently with respect to the underlying order of character vectors
>> avoids other even bigger surprises.
>>
>> Also note that the underlying order of character vectors actually
>> depends on your locale. One way to guarantee consistent results across
>> platforms/locales is by explicitly specifying the levels when making
>> a factor e.g.
>>
>>    f <- factor(x, levels=unique(x))
>>    split(1:11, f)
>>
>> This is particularly sensible when writing unit tests.
>>
>> Cheers,
>> H.
>>
>>
>>> Best, Peter
>>>
>>> Am 20.10.2017 9:49 nachm. schrieb "Iñaki Úcar" <i.ucar86 at gmail.com>:
>>>
>>> Hi Peter,
>>>>
>>>> 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
>>>>
>>>>> Hey,
>>>>>
>>>>> I found this - for me - quite surprising and puzzling behaviour of
>>>>>
>>>> split().
>>>>
>>>>>
>>>>>
>>>>> split(1:11, as.character(1:11))
>>>>> split(1:11, 1:11)
>>>>>
>>>>>
>>>>> When splitting by numerics everything works as expected - sorting of
>>>>>
>>>> input
>>>>
>>>>> == sorting of output -- but when using a character vector everything
>>>>> gets
>>>>> re-sorted alphabetical.
>>>>>
>>>>>
>>>>> Although, there are some references in the help files to what happens
>>>>>
>>>> when
>>>>
>>>>> using split, I did not find any note on this - for me - rather
>>>>> unexpected
>>>>> behaviour.
>>>>>
>>>>
>>>> As the documentation states,
>>>>
>>>>         f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
>>>>            grouping, or a list of such factors in which case their
>>>>            interaction is used for the grouping.
>>>>
>>>> And, in fact,
>>>>
>>>> as.factor(1:11)
>>>>>
>>>>   [1] 1  2  3  4  5  6  7  8  9  10 11
>>>> Levels: 1 2 3 4 5 6 7 8 9 10 11
>>>>
>>>> as.factor(as.character(1:11))
>>>>>
>>>>   [1] 1  2  3  4  5  6  7  8  9  10 11
>>>> Levels: 1 10 11 2 3 4 5 6 7 8 9
>>>>
>>>> Regards,
>>>> Iñaki
>>>>
>>>> I would like it best when the sorting of split results stays the
>>>>> same no
>>>>> matter the input (sorting of input == sorting of output)
>>>>>
>>>>> If that is not possibly a note of caution in the help pages and
>>>>> maybe an
>>>>> example might be valuable.
>>>>>
>>>>>
>>>>> Best, Peter
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>>>> hz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84V
>>>>> tBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZ
>>>>> T7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCD
>>>>> PAclXHoc9_le3Z1DrZg0nQqg&e=
>>>>>
>>>>>
>>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
>>> hz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84V
>>> tBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZ
>>> T7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCD
>>> PAclXHoc9_le3Z1DrZg0nQqg&e=
>>>
>>>
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list