[R] Adding SORT to UNIQUE

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Dec 21 21:38:08 CET 2021


Stephen:
You seem confused about data frames. sort(unique(...)) has no problem
sorting individual columns in a data frame (mod the issues about
mixing numerics and non-numerics that have already been discussed).
But the problem is that the results can *not* be put back in a data
frame because, **by definition** all columns in a data frame **must**
have the same number of values. unique() will change the number of
values in a column if done column by column, e.g. via lapply() or
looping over columns. Consequently, if you do this by lapply(), you'll
get a list back, not a data frame. e.g.

> dat <- data.frame(a = rep(3:1,2), b = c(5:1,5))
> dat
  a b
1 3 5
2 2 4
3 1 3
4 3 2
5 2 1
6 1 5
>
> ## via lapply
> dat <- lapply(dat, \(x)sort(unique(x)))
> dat  ## a list.
$a
[1] 1 2 3

$b
[1] 1 2 3 4 5

> ## Trying to do this with an explicit loop results in an error
> dat <- data.frame(a = rep(1:3,2), b = c(1:5,5))
> for(nm in names(dat))dat[[nm]] <- sort(unique(dat[[nm]])) ## error
Error in `[[<-.data.frame`(`*tmp*`, nm, value = c(1, 2, 3, 4, 5)) :
  replacement has 5 rows, data has 6

OTOH, unique() has a data.frame method which will give unique *rows*
(thinking of a data frame as a matrix-like object with a "dim"
attribute):

> dat <- data.frame(a = c(1,2,1), b = c('a','b','a'))
> dat
  a b
1 1 a
2 2 b
3 1 a
> unique(dat)
  a b
1 1 a
2 2 b

There is no sort() method for data frames as this has no obvious
single interpretation of sorting by whole rows. However, see ?sort for
an example using ?order to carry out one possible interpretation of
sorting by rows.

Bert


On Tue, Dec 21, 2021 at 7:16 AM Stephen H. Dawson, DSL via R-help
<r-help using r-project.org> wrote:
>
> Thanks everyone for the replies.
>
> It is clear one either needs to write a function or put the unique
> entries into another dataframe.
>
> It seems odd R cannot sort a list of unique column entries with ease.
> Python and SQL can do it with ease.
>
> QUESTION
> Is there a simpler means than other than the unique function to capture
> distinct column entries, then sort that list?
>
>
> *Stephen Dawson, DSL*
> /Executive Strategy Consultant/
> Business & Technology
> +1 (865) 804-3454
> http://www.shdawson.com <http://www.shdawson.com>
>
>
> On 12/20/21 5:53 PM, Rui Barradas wrote:
> > Hello,
> >
> > Inline.
> >
> > Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu:
> >> Thanks.
> >>
> >> sort(unique(Data[[1]]))
> >>
> >> This syntax provides row numbers, not column values.
> >
> > This is not right.
> > The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]]
> > extracts the column vector.
> >
> > As for my previous answer, it was not addressing the question, I
> > misinterpreted it as being a question on how to sort by numeric order
> > when the data is not numeric. Here is a, hopefully, complete answer.
> > Still with package stringr.
> >
> >
> > cols_to_sort <- 1:4
> >
> > Data2 <- lapply(Data[cols_to_sort], \(x){
> >   stringr::str_sort(unique(x), numeric = TRUE)
> > })
> >
> >
> > Or using Avi's suggestion of writing a function to do all the work and
> > simplify the lapply loop later,
> >
> >
> > unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...)
> > Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> >>
> >> *Stephen Dawson, DSL*
> >> /Executive Strategy Consultant/
> >> Business & Technology
> >> +1 (865) 804-3454
> >> http://www.shdawson.com <http://www.shdawson.com>
> >>
> >>
> >> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote:
> >>> Hi,
> >>>
> >>>
> >>> Running a simple syntax set to review entries in dataframe columns.
> >>> Here is the working code.
> >>>
> >>> Data <- read.csv("./input/Source.csv", header=T)
> >>> describe(Data)
> >>> summary(Data)
> >>> unique(Data[1])
> >>> unique(Data[2])
> >>> unique(Data[3])
> >>> unique(Data[4])
> >>>
> >>> I would like to add sort the unique entries. The data in the various
> >>> columns are not defined as numbers, but also text. I realize 1 and
> >>> 10 will not sort properly, as the column is not defined as a number,
> >>> but want to see what I have in the columns viewed as sorted.
> >>>
> >>> QUESTION
> >>> What is the best process to sort unique output, please?
> >>>
> >>>
> >>> Thanks.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list