[R] difference between unique() and !duplicated()

T.Lok T.Lok at rug.nl
Thu Sep 13 11:47:50 CEST 2007

Yesterday I spend the whole day struggling on how to get 
the maximum value of "y" for every unique value of "x" 
from the dataframe "test". In the R Book (Crawley, 2007) 
an example of this can be found on page 121. I tried to do 
it this way, but I failed.

In the end, I figured out how to get it working (first 
order, and afterwards use !duplicated()). My question is: 
why does it not work with the unique() function on p. 121 
i.e. test[rev(order(x)),][unique(y),]) ?

As a simple example, I used to following syntax:

> x <- c("A","A","B","B","C","C","D")
> y <- c(1,2,1,1,2,3,1)
> z <- c("yes","yes","no","yes","no","no","no")
> test <- data.frame(x,y,z)
> test

   x y   z
1 A 1 yes
2 A 2 yes
3 B 1  no
4 B 1 yes
5 C 2  no
6 C 3  no
7 D 1  no

> test[rev(order(test$y, test$z)),][unique(test$x),]

   x y   z
6 C 3  no
2 A 2 yes
5 C 2  no
4 B 1 yes

# this clearly does not give a unique value for x, since 
there are 2 C's and no D!

> test[rev(order(test$y, test$z)),][!duplicated(test$x),]

   x y   z
6 C 3  no
5 C 2  no
1 A 1 yes
3 B 1  no

# this also doesn't work
# then I thought, maybe first use the order() function, 
then unique()

> test[rev(order(test$y, test$z)),]

   x y   z
6 C 3  no
2 A 2 yes
5 C 2  no
4 B 1 yes
1 A 1 yes
7 D 1  no
3 B 1  no

> test1 <- test[rev(order(test$y, test$z)),]
> test1[unique(test1$x),]

   x y   z
5 C 2  no
6 C 3  no
2 A 2 yes
4 B 1 yes

# still no unique values for x

> test1[!duplicated(test1$x),]

   x y   z
6 C 3  no
2 A 2 yes
4 B 1 yes
7 D 1  no

# finally I get unique values for x, for the maximum value 
of y (and z). But why does this not work when giving the 
order() and !duplicated() command simultaneously?
And why does only !duplicated() work, and not unique()?

More information about the R-help mailing list