[Rd] Ordering of values returned by unique

Tony Plate tplate at blackmesacapital.com
Wed Sep 29 18:09:18 CEST 2004

AFAIK, it has always worked that way in S-plus and R.  Furthermore, the 
documentation in R for 'unique' says that it removes duplicated 
elements.  This does seem to leave the possibility that element other than 
the first of a set of duplicates is retained, which could mess up the 
order.  However, the documentation for 'duplicated' is clearer: it says 
that 'duplicated' identifies duplicates of earlier elements.  Also in the 
examples for 'duplicated', it says that x[!duplicated(x)] == unique(x) 

I depend on this all the time, so I also checked some references.  In the 
Blue book the documentation for the functions unique and duplicated is 
combined and implies the above.  In MASS 4th Ed, the page referred to by 
the index entry for 'unique' (p48, #9 in my copy) states that 'unique' 
removes duplicates as identified by 'duplicated', which implies that the 
order of retained elements is not changed.  The Green book has no index 
entry for 'unique'.  In S-plus the implementation of unique.default(x) uses 

So, I think the evidence is pretty strong that unique(x) will always return 
elements in the same order as they first appear in x.  But it would be nice 
if the documentation for 'unique' explicitly stated that this is the 
behavior for all methods.  (It does state this for the array method for 

-- Tony Plate

At Wednesday 09:17 AM 9/29/2004, Witold Eryk Wolski wrote:
>Is the ordering of the values returned something on what I can rely on, a 
>form of a standard,  that a function called unique in R (in futher 
>versions) will return the uniq elements in order of they first occurcence.
> > x<-c(2,2,1,2)
> > unique(x)
>[1] 2 1
>Its seems not to be the standard. E.g. matlab
> >> x=[2,2,1,2]
>x =
>     2     2     1     2
> >> unique(x)
>ans =
>     1     2
>I just noted it because, the way how it is working now is extremely 
>usefull for some applications (e.g tree traversal), so i use it in a 
>script. But I am a little woried if I can rely on this behaviour in 
>further versions. And furthermore can I assume that someone reading the 
>code will think that it works in that way?
>Or is it better to define a additional function?
>    res<-rep(NA,length(unique(x))
>    count<-0
>    for(i in x)
>    {
>        if(!i%in%res)
>            {
>                    count<-count+1
>                     res[count]<-i
>            }
>    }
>    res
>Dipl. bio-chem. Witold Eryk Wolski
>MPI-Moleculare Genetic
>Ihnestrasse 63-73 14195 Berlin           _
>tel: 0049-30-83875219                   'v'
>http://www.molgen.mpg.de/~wolski       /   \
>mail: witek96 at users.sourceforge.net  ---W-W----
>      wolski at molgen.mpg.de
>R-devel at stat.math.ethz.ch mailing list

More information about the R-devel mailing list