[R] Adding SORT to UNIQUE

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Tue Dec 21 19:09:14 CET 2021


On 21/12/2021 12:53 p.m., Duncan Murdoch wrote:
> On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:
>> It is a very rational choice, not a design flaw. I don't like every choice they have made for that class, but this one is very solid, and treating data frames as lists of columns consistently helps all of us.
> I think outlawing matrix notation is a really bad idea.  It makes code
> harder to read, and makes it much harder to switch to matrices, which
> sometimes gives a huge speed boost to code.
> 
> For example, John Fox posted an example that showed that operations on
> whole columns of dataframes is about twice as fast using list notation
> as using matrix notation.  But for operating on whole rows, 

... or on individual elements ...

 > matrices are
> about 100 times faster than dataframes.  You shouldn't use notation that
> makes the switch to matrices more difficult.
> 
> Duncan Murdoch
> 
>>
>> On December 21, 2021 9:02:56 AM PST, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>> On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
>>>> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by design. Data frames are lists of columns.
>>>
>>> That's just one of the design flaws in tibbles, but not the worst one.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> sort(unique(Data[1]))
>>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
>>>>>>> decreasing)) :
>>>>>>>         undefined columns selected
>>>>>>
>>>>>> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
>>>>>> Data[[1]] for that, so
>>>>>>
>>>>>>        sort(unique(Data[[1]]))
>>>>>
>>>>> Actually, I'd probably recommend
>>>>>
>>>>>      sort(unique(Data[, 1]))
>>>>>
>>>>> instead.  This treats Data as a matrix rather than as a list.
>>>>> Dataframes are lists that look like matrices, but to me the matrix
>>>>> aspect is usually more intuitive.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> I think Rui already pointed out the typo in the quoted text below...
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>> The recommended syntax did not work, as listed above.
>>>>>>>
>>>>>>> What I want is the sort of distinct column output. Again, the column may
>>>>>>> be text or numbers. This is a huge analysis effort with data coming at
>>>>>>> me from many different sources.
>>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>
>>>>>>>>> It is clear one either needs to write a function or put the unique
>>>>>>>>> entries into another dataframe.
>>>>>>>>>
>>>>>>>>> It seems odd R cannot sort a list of unique column entries with ease.
>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>
>>>>>>>> I've seen several responses that looked pretty simple.  It's hard to
>>>>>>>> beat sort(unique(x)), though there's a fair bit of confusion about
>>>>>>>> what you actually want.  Maybe you should post an example of the code
>>>>>>>> you'd use in Python?
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>>
>>>>>>>>> QUESTION
>>>>>>>>> Is there a simpler means than other than the unique function to capture
>>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>> Business & Technology
>>>>>>>>> +1 (865) 804-3454
>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Inline.
>>>>>>>>>>
>>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>
>>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>>
>>>>>>>>>> This is not right.
>>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>>>>>>>>>> extracts the column vector.
>>>>>>>>>>
>>>>>>>>>> As for my previous answer, it was not addressing the question, I
>>>>>>>>>> misinterpreted it as being a question on how to sort by numeric order
>>>>>>>>>> when the data is not numeric. Here is a, hopefully, complete answer.
>>>>>>>>>> Still with package stringr.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>>         stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>>> })
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Or using Avi's suggestion of writing a function to do all the work and
>>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rui Barradas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>> Business & Technology
>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Running a simple syntax set to review entries in dataframe columns.
>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>
>>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to add sort the unique entries. The data in the various
>>>>>>>>>>>> columns are not defined as numbers, but also text. I realize 1 and
>>>>>>>>>>>> 10 will not sort properly, as the column is not defined as a number,
>>>>>>>>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>>>>>>>>
>>>>>>>>>>>> QUESTION
>>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>



More information about the R-help mailing list