[R] Tagging identical rows of a matrix

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri May 14 21:23:43 CEST 2004


The trick is to collapse the rows, as Andy Liaw pointed out and
unique.matrix (and .data.frame) does.  Once you have the collapsed rows as
character vectors, unique and match will do a fast job (via internal
hashing). (Andy's solution via factor() is the same thing with a bit of
extra baggage.)

On 14 May 2004, Douglas Bates wrote:

> Scott Waichler <scott.waichler at pnl.gov> writes:
> 
> > I would like to generate a vector having the same length
> > as the number of rows in a matrix.  The vector should contain
> > an integer indicating the "group" of the row, where identical
> > matrix rows are in a group, and a unique row has a unique integer.
> > Thus, for
> > 
> > a <- c(1,2)
> > b <- c(1,3)
> > c <- c(1,2)
> > d <- c(1,2)
> > e <- c(1,3)
> > f <- c(2,1)
> > mat <- rbind(a,b,c,d,e,f)
> > 
> > I would like to get the vector c(1,2,1,1,2,3).  I know dist() gives
> > part of the answer, but I can't figure out how to use it for
> > this purpose without doing a lot of looping.  I need to apply this
> > to matrices up to ~100000 rows.
> 
> I believe you want to start with unique which, when applied to a
> matrix, provides the unique rows.
> 
> > unique(mat)
>   [,1] [,2]
> a    1    2
> b    1    3
> f    2    1
> 
> I'm sure others will be able to provide clever ways of doing the
> matching against the unique rows.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list