[R] Adding SORT to UNIQUE

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Tue Dec 21 18:28:52 CET 2021


Dear Jeff,

On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" <r-help-bounces using r-project.org on behalf of jdnewmil using dcn.davis.ca.us> wrote:

    Intuitive, perhaps, but noticably slower. 

I think that in most applications, one wouldn't notice the difference; for example:

> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))

> microbenchmark(D[, 1])
Unit: microseconds
   expr   min    lq    mean median     uq    max neval
 D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100

> microbenchmark(D[[1]])
Unit: microseconds
   expr   min    lq    mean median     uq    max neval
 D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100

Best,
 John


    And it doesn't work on tibbles by design. Data frames are lists of columns.


    On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
    >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
    >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
    >>> Thanks for the reply.
    >>>
    >>> sort(unique(Data[1]))
    >>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing =
    >>> decreasing)) :
    >>>      undefined columns selected
    >> 
    >> That's the wrong syntax:  Data[1] is not "column one of Data".  Use
    >> Data[[1]] for that, so
    >> 
    >>     sort(unique(Data[[1]]))
    >
    >Actually, I'd probably recommend
    >
    >   sort(unique(Data[, 1]))
    >
    >instead.  This treats Data as a matrix rather than as a list. 
    >Dataframes are lists that look like matrices, but to me the matrix 
    >aspect is usually more intuitive.
    >
    >Duncan Murdoch
    >
    >> 
    >> I think Rui already pointed out the typo in the quoted text below...
    >> 
    >> Duncan Murdoch
    >> 
    >>>
    >>> The recommended syntax did not work, as listed above.
    >>>
    >>> What I want is the sort of distinct column output. Again, the column may
    >>> be text or numbers. This is a huge analysis effort with data coming at
    >>> me from many different sources.
    >>>
    >>>
    >>> *Stephen Dawson, DSL*
    >>> /Executive Strategy Consultant/
    >>> Business & Technology
    >>> +1 (865) 804-3454
    >>> http://www.shdawson.com <http://www.shdawson.com>
    >>>
    >>>
    >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
    >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote:
    >>>>> Thanks everyone for the replies.
    >>>>>
    >>>>> It is clear one either needs to write a function or put the unique
    >>>>> entries into another dataframe.
    >>>>>
    >>>>> It seems odd R cannot sort a list of unique column entries with ease.
    >>>>> Python and SQL can do it with ease.
    >>>>
    >>>> I've seen several responses that looked pretty simple.  It's hard to
    >>>> beat sort(unique(x)), though there's a fair bit of confusion about
    >>>> what you actually want.  Maybe you should post an example of the code
    >>>> you'd use in Python?
    >>>>
    >>>> Duncan Murdoch
    >>>>
    >>>>>
    >>>>> QUESTION
    >>>>> Is there a simpler means than other than the unique function to capture
    >>>>> distinct column entries, then sort that list?
    >>>>>
    >>>>>
    >>>>> *Stephen Dawson, DSL*
    >>>>> /Executive Strategy Consultant/
    >>>>> Business & Technology
    >>>>> +1 (865) 804-3454
    >>>>> http://www.shdawson.com <http://www.shdawson.com>
    >>>>>
    >>>>>
    >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
    >>>>>> Hello,
    >>>>>>
    >>>>>> Inline.
    >>>>>>
    >>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
    >>>>>>> Thanks.
    >>>>>>>
    >>>>>>> sort(unique(Data[[1]]))
    >>>>>>>
    >>>>>>> This syntax provides row numbers, not column values.
    >>>>>>
    >>>>>> This is not right.
    >>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
    >>>>>> extracts the column vector.
    >>>>>>
    >>>>>> As for my previous answer, it was not addressing the question, I
    >>>>>> misinterpreted it as being a question on how to sort by numeric order
    >>>>>> when the data is not numeric. Here is a, hopefully, complete answer.
    >>>>>> Still with package stringr.
    >>>>>>
    >>>>>>
    >>>>>> cols_to_sort <- 1:4
    >>>>>>
    >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
    >>>>>>      stringr::str_sort(unique(x), numeric = TRUE)
    >>>>>> })
    >>>>>>
    >>>>>>
    >>>>>> Or using Avi's suggestion of writing a function to do all the work and
    >>>>>> simplify the lapply loop later,
    >>>>>>
    >>>>>>
    >>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
    >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
    >>>>>>
    >>>>>>
    >>>>>> Hope this helps,
    >>>>>>
    >>>>>> Rui Barradas
    >>>>>>
    >>>>>>
    >>>>>>>
    >>>>>>> *Stephen Dawson, DSL*
    >>>>>>> /Executive Strategy Consultant/
    >>>>>>> Business & Technology
    >>>>>>> +1 (865) 804-3454
    >>>>>>> http://www.shdawson.com <http://www.shdawson.com>
    >>>>>>>
    >>>>>>>
    >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
    >>>>>>>> Hi,
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Running a simple syntax set to review entries in dataframe columns.
    >>>>>>>> Here is the working code.
    >>>>>>>>
    >>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
    >>>>>>>> describe(Data)
    >>>>>>>> summary(Data)
    >>>>>>>> unique(Data[1])
    >>>>>>>> unique(Data[2])
    >>>>>>>> unique(Data[3])
    >>>>>>>> unique(Data[4])
    >>>>>>>>
    >>>>>>>> I would like to add sort the unique entries. The data in the various
    >>>>>>>> columns are not defined as numbers, but also text. I realize 1 and
    >>>>>>>> 10 will not sort properly, as the column is not defined as a number,
    >>>>>>>> but want to see what I have in the columns viewed as sorted.
    >>>>>>>>
    >>>>>>>> QUESTION
    >>>>>>>> What is the best process to sort unique output, please?
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Thanks.
    >>>>>>>
    >>>>>>> ______________________________________________
    >>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>>>>>> PLEASE do read the posting guide
    >>>>>>> http://www.R-project.org/posting-guide.html
    >>>>>>> and provide commented, minimal, self-contained, reproducible code.
    >>>>>>
    >>>>>
    >>>>> ______________________________________________
    >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>>>> PLEASE do read the posting guide
    >>>>> http://www.R-project.org/posting-guide.html
    >>>>> and provide commented, minimal, self-contained, reproducible code.
    >>>>
    >>>>
    >>>
    >>>
    >>
    >
    >______________________________________________
    >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >https://stat.ethz.ch/mailman/listinfo/r-help
    >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >and provide commented, minimal, self-contained, reproducible code.

    -- 
    Sent from my phone. Please excuse my brevity.

    ______________________________________________
    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list