[R] Adding SORT to UNIQUE

Stephen H. Dawson, DSL @erv|ce @end|ng |rom @hd@w@on@com
Thu Dec 23 17:38:23 CET 2021


Hi Duncan,


Thanks for the reply. You bring much insight to the equation of the R 
journey. I look forward to dialoging with you.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/22/21 12:12 PM, Duncan Murdoch wrote:
> On 22/12/2021 12:01 p.m., Stephen H. Dawson, DSL wrote:
>> Data <- read.csv("./input/Source.csv", header=T)
>> v1 <- sort(unique(Data[, 1]))
>> cat(format(v1, justify = "right"), sep = "\n")
>>
>> OK, working with the options you presented. This is the combination
>> where I gain the most benefit.
>>
>> However, there is no listing of a column header with the output of this
>> syntax.
>>
>>   > cat(format(v1, justify = "right"), sep = "\n")
>>    2
>>    3
>>    4
>>    5
>>    6
>>    7
>>    8
>>    9
>> 10
>>   >
>>
>> NOTE
>> The output here is correct (unique) based on the entries from the 
>> column.
>>
>> QUESTION
>> How does one add a text label of something as simple as v1 to the
>> vertical output of this syntax, please?
>
> In this case, you'd just put in cat("v1\n") before the given command.
>
> In the general case where you want to get the name of the column from 
> the dataframe, I think you'll need to write your own function.  The 
> one Rui just posted looks pretty good.  To get it to print without the 
> row numbers as in the example above, just change it a little in the 
> header and one other line:
>
> print.sortUnique <- function(x, row.names = FALSE, ...){
>    n <- max(lengths(x))
>    y <- lapply(x, \(.x) c(.x, rep("", n - length(.x))))
>    y <- do.call(cbind.data.frame, y)
>    names(y) <- names(x)
>    print(y, row.names = row.names, ...)
>    invisible(x)
> }
>
> This will give
>
> > Data2
>  V1 V2 V3 V4
>   3  2  2  1
>   5  4  3  2
>   6  5  4  4
>   7  6  5  5
>   8  9  6  6
>   9 11  8  9
>  12 15  9 10
>  14 16 11 11
>  15 17 14 12
>  18 18 15 13
>  19 19 17 14
>  20    19 16
>        20 18
>           19
>
> with his example data.
>
> Duncan Murdoch
>
>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 12/22/21 11:13 AM, Stephen H. Dawson, DSL via R-help wrote:
>>> OK, now I get what you are suggesting.
>>>
>>> Much appreciated.
>>>
>>>
>>> Kindest Regards,
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/22/21 11:08 AM, Duncan Murdoch wrote:
>>>> On 22/12/2021 10:55 a.m., Stephen H. Dawson, DSL wrote:
>>>>> I see.
>>>>>
>>>>> So, we are talking taking the output into a new dataframe. I was 
>>>>> hoping
>>>>> to have the output rendered on screen without another dataframe, 
>>>>> but I
>>>>> can live with this option it if must occur.
>>>>>
>>>>> Am I correct the desired vertical output must first go to a 
>>>>> dataframe?
>>>>
>>>> No, that's just one option.  The other 3 don't use dataframes.
>>>>
>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>>
>>>>> On 12/22/21 10:47 AM, Duncan Murdoch wrote:
>>>>>> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> Both syntax options work to render the correct (unique) output.
>>>>>>> However,
>>>>>>> the output is rendered as horizontal. What needs to happen to 
>>>>>>> get the
>>>>>>> output to render vertical, please?
>>>>>>
>>>>>> The result of those expressions is a vector of the same type as the
>>>>>> column, so your question is really about how to get a vector to 
>>>>>> print
>>>>>> one element per line.
>>>>>>
>>>>>> Probably the simplest way is to put the vector in a dataframe (or
>>>>>> matrix, or tibble, depending on which formatting you prefer). For
>>>>>> example,
>>>>>>
>>>>>>>      v <- c("red", "green", "blue")
>>>>>>>      data.frame(v)
>>>>>>         v
>>>>>> 1   red
>>>>>> 2 green
>>>>>> 3  blue
>>>>>>
>>>>>> If you want a more minimal display, try
>>>>>>
>>>>>>> cat(v, sep = "\n")
>>>>>> red
>>>>>> green
>>>>>> blue
>>>>>>
>>>>>> or
>>>>>>
>>>>>>> cat(format(v, justify = "right"), sep = "\n")
>>>>>>     red
>>>>>> green
>>>>>>    blue
>>>>>>
>>>>>> If you want this to happen when you auto-print the object, you can
>>>>>> give it a class attribute and write a function to print that class,
>>>>>> e.g.
>>>>>>
>>>>>>>     class(v) <- "oneperline"
>>>>>>>
>>>>>>>      print.oneperline <- function(x, ...) cat(format(x, justify =
>>>>>> "right"), sep = "\n")
>>>>>>>
>>>>>>>      v
>>>>>>     red
>>>>>> green
>>>>>>    blue
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>>>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>>>>> Thanks for the reply.
>>>>>>>>>>
>>>>>>>>>> sort(unique(Data[1]))
>>>>>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last,
>>>>>>>>>> decreasing =
>>>>>>>>>> decreasing)) :
>>>>>>>>>>         undefined columns selected
>>>>>>>>>
>>>>>>>>> That's the wrong syntax:  Data[1] is not "column one of Data". 
>>>>>>>>> Use
>>>>>>>>> Data[[1]] for that, so
>>>>>>>>>
>>>>>>>>>        sort(unique(Data[[1]]))
>>>>>>>>
>>>>>>>> Actually, I'd probably recommend
>>>>>>>>
>>>>>>>>      sort(unique(Data[, 1]))
>>>>>>>>
>>>>>>>> instead.  This treats Data as a matrix rather than as a list.
>>>>>>>> Dataframes are lists that look like matrices, but to me the matrix
>>>>>>>> aspect is usually more intuitive.
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think Rui already pointed out the typo in the quoted text
>>>>>>>>> below...
>>>>>>>>>
>>>>>>>>> Duncan Murdoch
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The recommended syntax did not work, as listed above.
>>>>>>>>>>
>>>>>>>>>> What I want is the sort of distinct column output. Again, the
>>>>>>>>>> column
>>>>>>>>>> may
>>>>>>>>>> be text or numbers. This is a huge analysis effort with data
>>>>>>>>>> coming at
>>>>>>>>>> me from many different sources.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>> Business & Technology
>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>>>>
>>>>>>>>>>>> It is clear one either needs to write a function or put the
>>>>>>>>>>>> unique
>>>>>>>>>>>> entries into another dataframe.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems odd R cannot sort a list of unique column entries 
>>>>>>>>>>>> with
>>>>>>>>>>>> ease.
>>>>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>>>>
>>>>>>>>>>> I've seen several responses that looked pretty simple. It's
>>>>>>>>>>> hard to
>>>>>>>>>>> beat sort(unique(x)), though there's a fair bit of confusion
>>>>>>>>>>> about
>>>>>>>>>>> what you actually want.  Maybe you should post an example of 
>>>>>>>>>>> the
>>>>>>>>>>> code
>>>>>>>>>>> you'd use in Python?
>>>>>>>>>>>
>>>>>>>>>>> Duncan Murdoch
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> QUESTION
>>>>>>>>>>>> Is there a simpler means than other than the unique 
>>>>>>>>>>>> function to
>>>>>>>>>>>> capture
>>>>>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>>> Business & Technology
>>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Inline.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help
>>>>>>>>>>>>> escreveu:
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is not right.
>>>>>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax
>>>>>>>>>>>>> Data[[1]]
>>>>>>>>>>>>> extracts the column vector.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for my previous answer, it was not addressing the
>>>>>>>>>>>>> question, I
>>>>>>>>>>>>> misinterpreted it as being a question on how to sort by 
>>>>>>>>>>>>> numeric
>>>>>>>>>>>>> order
>>>>>>>>>>>>> when the data is not numeric. Here is a, hopefully, complete
>>>>>>>>>>>>> answer.
>>>>>>>>>>>>> Still with package stringr.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>>>>
>>>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>>>>>         stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>>>>>> })
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or using Avi's suggestion of writing a function to do all the
>>>>>>>>>>>>> work and
>>>>>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> unisort2 <- function(vec, ...)
>>>>>>>>>>>>> stringr::str_sort(unique(vec), ...)
>>>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rui Barradas
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>>>>> Business & Technology
>>>>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Running a simple syntax set to review entries in dataframe
>>>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like to add sort the unique entries. The data in 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> various
>>>>>>>>>>>>>>> columns are not defined as numbers, but also text. I 
>>>>>>>>>>>>>>> realize
>>>>>>>>>>>>>>> 1 and
>>>>>>>>>>>>>>> 10 will not sort properly, as the column is not defined 
>>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>>> number,
>>>>>>>>>>>>>>> but want to see what I have in the columns viewed as 
>>>>>>>>>>>>>>> sorted.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> QUESTION
>>>>>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>>>>>>>>>>>>>> more, see
>>>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>>>>>>>>> see
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>



More information about the R-help mailing list