[R] Adding SORT to UNIQUE

Stephen H. Dawson, DSL @erv|ce @end|ng |rom @hd@w@on@com
Wed Dec 22 16:27:01 CET 2021


Bert,


Thanks for the reply.

I did not think to put values back into the same column. This action 
would not make sense to me, as it would destroy data integrity. I guess 
adding to a new column in the same container, in this case a dataframe, 
is possible but again not probable with me.

Either way, thanks for confirming all that comes out count-wise in a 
dataframe is what must go back into a dataframe count-wise.

It is nice to have folks on a mailing list that help to flush out what 
one thinks is and will happen with syntax versus what is happening and 
will happen with syntax.


*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com <http://www.shdawson.com>


On 12/21/21 3:38 PM, Bert Gunter wrote:
> Stephen:
> You seem confused about data frames. sort(unique(...)) has no problem
> sorting individual columns in a data frame (mod the issues about
> mixing numerics and non-numerics that have already been discussed).
> But the problem is that the results can *not* be put back in a data
> frame because, **by definition** all columns in a data frame **must**
> have the same number of values. unique() will change the number of
> values in a column if done column by column, e.g. via lapply() or
> looping over columns. Consequently, if you do this by lapply(), you'll
> get a list back, not a data frame. e.g.
>
>> dat <- data.frame(a = rep(3:1,2), b = c(5:1,5))
>> dat
>    a b
> 1 3 5
> 2 2 4
> 3 1 3
> 4 3 2
> 5 2 1
> 6 1 5
>> ## via lapply
>> dat <- lapply(dat, \(x)sort(unique(x)))
>> dat  ## a list.
> $a
> [1] 1 2 3
>
> $b
> [1] 1 2 3 4 5
>
>> ## Trying to do this with an explicit loop results in an error
>> dat <- data.frame(a = rep(1:3,2), b = c(1:5,5))
>> for(nm in names(dat))dat[[nm]] <- sort(unique(dat[[nm]])) ## error
> Error in `[[<-.data.frame`(`*tmp*`, nm, value = c(1, 2, 3, 4, 5)) :
>    replacement has 5 rows, data has 6
>
> OTOH, unique() has a data.frame method which will give unique *rows*
> (thinking of a data frame as a matrix-like object with a "dim"
> attribute):
>
>> dat <- data.frame(a = c(1,2,1), b = c('a','b','a'))
>> dat
>    a b
> 1 1 a
> 2 2 b
> 3 1 a
>> unique(dat)
>    a b
> 1 1 a
> 2 2 b
>
> There is no sort() method for data frames as this has no obvious
> single interpretation of sorting by whole rows. However, see ?sort for
> an example using ?order to carry out one possible interpretation of
> sorting by rows.
>
> Bert
>
>
> On Tue, Dec 21, 2021 at 7:16 AM Stephen H. Dawson, DSL via R-help
> <r-help using r-project.org> wrote:
>> Thanks everyone for the replies.
>>
>> It is clear one either needs to write a function or put the unique
>> entries into another dataframe.
>>
>> It seems odd R cannot sort a list of unique column entries with ease.
>> Python and SQL can do it with ease.
>>
>> QUESTION
>> Is there a simpler means than other than the unique function to capture
>> distinct column entries, then sort that list?
>>
>>
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com>
>>
>>
>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>> Hello,
>>>
>>> Inline.
>>>
>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
>>>> Thanks.
>>>>
>>>> sort(unique(Data[[1]]))
>>>>
>>>> This syntax provides row numbers, not column values.
>>> This is not right.
>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
>>> extracts the column vector.
>>>
>>> As for my previous answer, it was not addressing the question, I
>>> misinterpreted it as being a question on how to sort by numeric order
>>> when the data is not numeric. Here is a, hopefully, complete answer.
>>> Still with package stringr.
>>>
>>>
>>> cols_to_sort <- 1:4
>>>
>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>    stringr::str_sort(unique(x), numeric = TRUE)
>>> })
>>>
>>>
>>> Or using Avi's suggestion of writing a function to do all the work and
>>> simplify the lapply loop later,
>>>
>>>
>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>
>>>>
>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> Running a simple syntax set to review entries in dataframe columns.
>>>>> Here is the working code.
>>>>>
>>>>> Data <- read.csv("./input/Source.csv", header=T)
>>>>> describe(Data)
>>>>> summary(Data)
>>>>> unique(Data[1])
>>>>> unique(Data[2])
>>>>> unique(Data[3])
>>>>> unique(Data[4])
>>>>>
>>>>> I would like to add sort the unique entries. The data in the various
>>>>> columns are not defined as numbers, but also text. I realize 1 and
>>>>> 10 will not sort properly, as the column is not defined as a number,
>>>>> but want to see what I have in the columns viewed as sorted.
>>>>>
>>>>> QUESTION
>>>>> What is the best process to sort unique output, please?
>>>>>
>>>>>
>>>>> Thanks.
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list