[R] Adding SORT to UNIQUE

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Tue Dec 21 17:59:09 CET 2021


Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by design. Data frames are lists of columns.

On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>> Thanks for the reply.
>>>
>>> sort(unique(Data[1]))
>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>> decreasing)) :
>>>      undefined columns selected
>> 
>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>> Data[[1]] for that, so
>> 
>>     sort(unique(Data[[1]]))
>
>Actually, I'd probably recommend
>
>   sort(unique(Data[, 1]))
>
>instead.  This treats Data as a matrix rather than as a list. 
>Dataframes are lists that look like matrices, but to me the matrix 
>aspect is usually more intuitive.
>
>Duncan Murdoch
>
>> 
>> I think Rui already pointed out the typo in the quoted text below...
>> 
>> Duncan Murdoch
>> 
>>>
>>> The recommended syntax did not work, as listed above.
>>>
>>> What I want is the sort of distinct column output. Again, the column may
>>> be text or numbers. This is a huge analysis effort with data coming at
>>> me from many different sources.
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>>>> Thanks everyone for the replies.
>>>>>
>>>>> It is clear one either needs to write a function or put the unique
>>>>> entries into another dataframe.
>>>>>
>>>>> It seems odd R cannot sort a list of unique column entries with ease.
>>>>> Python and SQL can do it with ease.
>>>>
>>>> I've seen several responses that looked pretty simple.  It's hard to
>>>> beat sort(unique(x)), though there's a fair bit of confusion about
>>>> what you actually want.  Maybe you should post an example of the code
>>>> you'd use in Python?
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>> QUESTION
>>>>> Is there a simpler means than other than the unique function to capture
>>>>> distinct column entries, then sort that list?
>>>>>
>>>>>
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>>
>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Inline.
>>>>>>
>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>>>>> Thanks.
>>>>>>>
>>>>>>> sort(unique(Data[[1]]))
>>>>>>>
>>>>>>> This syntax provides row numbers, not column values.
>>>>>>
>>>>>> This is not right.
>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>>>>>> extracts the column vector.
>>>>>>
>>>>>> As for my previous answer, it was not addressing the question, I
>>>>>> misinterpreted it as being a question on how to sort by numeric order
>>>>>> when the data is not numeric. Here is a, hopefully, complete answer.
>>>>>> Still with package stringr.
>>>>>>
>>>>>>
>>>>>> cols_to_sort <- 1:4
>>>>>>
>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>      stringr::str_sort(unique(x), numeric = TRUE)
>>>>>> })
>>>>>>
>>>>>>
>>>>>> Or using Avi's suggestion of writing a function to do all the work and
>>>>>> simplify the lapply loop later,
>>>>>>
>>>>>>
>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>> Rui Barradas
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> Running a simple syntax set to review entries in dataframe columns.
>>>>>>>> Here is the working code.
>>>>>>>>
>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>> describe(Data)
>>>>>>>> summary(Data)
>>>>>>>> unique(Data[1])
>>>>>>>> unique(Data[2])
>>>>>>>> unique(Data[3])
>>>>>>>> unique(Data[4])
>>>>>>>>
>>>>>>>> I would like to add sort the unique entries. The data in the various
>>>>>>>> columns are not defined as numbers, but also text. I realize 1 and
>>>>>>>> 10 will not sort properly, as the column is not defined as a number,
>>>>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>>>>
>>>>>>>> QUESTION
>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list