[R] Adding SORT to UNIQUE

Stephen H. Dawson, DSL @erv|ce @end|ng |rom @hd@w@on@com
Wed Dec 22 18:01:49 CET 2021


Data <- read.csv("./input/Source.csv", header=T)
v1 <- sort(unique(Data[, 1]))
cat(format(v1, justify = "right"), sep = "\n")

OK, working with the options you presented. This is the combination 
where I gain the most benefit.

However, there is no listing of a column header with the output of this 
syntax.

 > cat(format(v1, justify = "right"), sep = "\n")
  2
  3
  4
  5
  6
  7
  8
  9
10
 >

NOTE
The output here is correct (unique) based on the entries from the column.

QUESTION
How does one add a text label of something as simple as v1 to the 
vertical output of this syntax, please?

*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/22/21 11:13 AM, Stephen H. Dawson, DSL via R-help wrote:
> OK, now I get what you are suggesting.
>
> Much appreciated.
>
>
> Kindest Regards,
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com <http://www.shdawson.com>
>
>
> On 12/22/21 11:08 AM, Duncan Murdoch wrote:
>> On 22/12/2021 10:55 a.m., Stephen H. Dawson, DSL wrote:
>>> I see.
>>>
>>> So, we are talking taking the output into a new dataframe. I was hoping
>>> to have the output rendered on screen without another dataframe, but I
>>> can live with this option it if must occur.
>>>
>>> Am I correct the desired vertical output must first go to a dataframe?
>>
>> No, that's just one option.  The other 3 don't use dataframes.
>>
>> Duncan Murdoch
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/22/21 10:47 AM, Duncan Murdoch wrote:
>>>> On 22/12/2021 10:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>> Thanks for the reply.
>>>>>
>>>>> Both syntax options work to render the correct (unique) output. 
>>>>> However,
>>>>> the output is rendered as horizontal. What needs to happen to get the
>>>>> output to render vertical, please?
>>>>
>>>> The result of those expressions is a vector of the same type as the
>>>> column, so your question is really about how to get a vector to print
>>>> one element per line.
>>>>
>>>> Probably the simplest way is to put the vector in a dataframe (or
>>>> matrix, or tibble, depending on which formatting you prefer). For
>>>> example,
>>>>
>>>>>     v <- c("red", "green", "blue")
>>>>>     data.frame(v)
>>>>        v
>>>> 1   red
>>>> 2 green
>>>> 3  blue
>>>>
>>>> If you want a more minimal display, try
>>>>
>>>>> cat(v, sep = "\n")
>>>> red
>>>> green
>>>> blue
>>>>
>>>> or
>>>>
>>>>> cat(format(v, justify = "right"), sep = "\n")
>>>>    red
>>>> green
>>>>   blue
>>>>
>>>> If you want this to happen when you auto-print the object, you can
>>>> give it a class attribute and write a function to print that class, 
>>>> e.g.
>>>>
>>>>>    class(v) <- "oneperline"
>>>>>
>>>>>     print.oneperline <- function(x, ...) cat(format(x, justify =
>>>> "right"), sep = "\n")
>>>>>
>>>>>     v
>>>>    red
>>>> green
>>>>   blue
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>>
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>>
>>>>> On 12/21/21 11:38 AM, Duncan Murdoch wrote:
>>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>>> Thanks for the reply.
>>>>>>>>
>>>>>>>> sort(unique(Data[1]))
>>>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, 
>>>>>>>> decreasing =
>>>>>>>> decreasing)) :
>>>>>>>>        undefined columns selected
>>>>>>>
>>>>>>> That's the wrong syntax:  Data[1] is not "column one of Data". Use
>>>>>>> Data[[1]] for that, so
>>>>>>>
>>>>>>>       sort(unique(Data[[1]]))
>>>>>>
>>>>>> Actually, I'd probably recommend
>>>>>>
>>>>>>     sort(unique(Data[, 1]))
>>>>>>
>>>>>> instead.  This treats Data as a matrix rather than as a list.
>>>>>> Dataframes are lists that look like matrices, but to me the matrix
>>>>>> aspect is usually more intuitive.
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>> I think Rui already pointed out the typo in the quoted text 
>>>>>>> below...
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>>
>>>>>>>> The recommended syntax did not work, as listed above.
>>>>>>>>
>>>>>>>> What I want is the sort of distinct column output. Again, the 
>>>>>>>> column
>>>>>>>> may
>>>>>>>> be text or numbers. This is a huge analysis effort with data
>>>>>>>> coming at
>>>>>>>> me from many different sources.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>> /Executive Strategy Consultant/
>>>>>>>> Business & Technology
>>>>>>>> +1 (865) 804-3454
>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help 
>>>>>>>>> wrote:
>>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>>
>>>>>>>>>> It is clear one either needs to write a function or put the 
>>>>>>>>>> unique
>>>>>>>>>> entries into another dataframe.
>>>>>>>>>>
>>>>>>>>>> It seems odd R cannot sort a list of unique column entries with
>>>>>>>>>> ease.
>>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>>
>>>>>>>>> I've seen several responses that looked pretty simple. It's 
>>>>>>>>> hard to
>>>>>>>>> beat sort(unique(x)), though there's a fair bit of confusion 
>>>>>>>>> about
>>>>>>>>> what you actually want.  Maybe you should post an example of the
>>>>>>>>> code
>>>>>>>>> you'd use in Python?
>>>>>>>>>
>>>>>>>>> Duncan Murdoch
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> QUESTION
>>>>>>>>>> Is there a simpler means than other than the unique function to
>>>>>>>>>> capture
>>>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>> Business & Technology
>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Inline.
>>>>>>>>>>>
>>>>>>>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help 
>>>>>>>>>>> escreveu:
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>>
>>>>>>>>>>>> This syntax provides row numbers, not column values.
>>>>>>>>>>>
>>>>>>>>>>> This is not right.
>>>>>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax 
>>>>>>>>>>> Data[[1]]
>>>>>>>>>>> extracts the column vector.
>>>>>>>>>>>
>>>>>>>>>>> As for my previous answer, it was not addressing the 
>>>>>>>>>>> question, I
>>>>>>>>>>> misinterpreted it as being a question on how to sort by numeric
>>>>>>>>>>> order
>>>>>>>>>>> when the data is not numeric. Here is a, hopefully, complete
>>>>>>>>>>> answer.
>>>>>>>>>>> Still with package stringr.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>>
>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>>>>>>        stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>>>>>> })
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Or using Avi's suggestion of writing a function to do all the
>>>>>>>>>>> work and
>>>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> unisort2 <- function(vec, ...) 
>>>>>>>>>>> stringr::str_sort(unique(vec), ...)
>>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>
>>>>>>>>>>> Rui Barradas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>>> Business & Technology
>>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Running a simple syntax set to review entries in dataframe
>>>>>>>>>>>>> columns.
>>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to add sort the unique entries. The data in the
>>>>>>>>>>>>> various
>>>>>>>>>>>>> columns are not defined as numbers, but also text. I realize
>>>>>>>>>>>>> 1 and
>>>>>>>>>>>>> 10 will not sort properly, as the column is not defined as a
>>>>>>>>>>>>> number,
>>>>>>>>>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>>>>>>>>>
>>>>>>>>>>>>> QUESTION
>>>>>>>>>>>>> What is the best process to sort unique output, please?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and 
>>>>>>>>>>>> more, see
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, 
>>>>>>>>>> see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible 
>>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list