[R] Count unique rows/columns in a matrix

Charles C. Berry cberry at tajo.ucsd.edu
Sat Jan 12 22:08:11 CET 2008


On Sat, 12 Jan 2008, Gabor Csardi wrote:

> On Sat, Jan 12, 2008 at 12:35:47PM -0500, John Kane wrote:
>> I definately did not read it that way but that may
>> have been my fault.  That table approach is quite
>> nice!
>>
>> Using it, you could just rebuild the vectors from the
>> names. Does this do more or less what you want?
>
> John, thanks. Still not good enough. :( The problem is not that the
> result was in string format, but that not the real values are
> compared, only the rounded values to six (?) decimals. I know this is only
> the default and more could be done by setting some parameters
> (probably options(digits) is enough), but then it is not very efficient,
> since instead of comparing 8 byte doubles i'll be comparing quite long
> strings for every single number in the matrix. This seems quite a hack
> to me.
>
> I'm thinking about the following solution. We hash every row/column
> of the matrix, then sort the hashed values, and compare only those
> rows/columns for which the hash values are the same. (With the proper
> comparision, ie. via "==" or all.equal.)
>
> Of course i'm not completely sure that this is faster than comparing
> long strings, but i'll give it a try. I have quite big matrices,
> that's why i need an efficient solution.
>

Gabor,

Try this. Order the matrix rows, conpare adjacent rows, and run length 
encode the logical vector of comparisons. Decode the rle() result to get 
the counts, use the logical vector comparing adjacent rows to identify the 
unique rows, and cbind() them together. Like this:

count.rows <-
   function(x)
   {
     order.x <- do.call(order,as.data.frame(x))
     equal.to.previous <-
       rowSums(x[tail(order.x,-1),] != x[head(order.x,-1),])==0
     tf.runs <- rle(equal.to.previous)
     counts <- c(1,
                 unlist(mapply( function(x,y) if (y) x+1 else (rep(1,x)),
                               tf.runs$length, tf.runs$value )))
     counts <- counts[ c(diff(counts) <= 0, TRUE ) ]
     unique.rows <- which( c(TRUE, !equal.to.previous ) )
     cbind( counts, x[order.x[ unique.rows ], ,drop=F] )
   }

HTH,

Chuck

> (I'm sending this to the list, because someone else was also
> interested, but i lost his email address.)
>
> Gabor
>
>> X<-matrix(c(1,2,3,1,2,3,4,5,6,1,3,2,4,5,6,1,1,1),6,3,byrow=TRUE)
>> xx <-table(apply(X, 1, paste, collapse=","))
>> hh <- names(xx)
>> nnk  <-(strsplit(hh, ","))
>> kkn  <- lapply(nnk, as.numeric)
>> df1 <-t(as.data.frame(kkn))
>> cbind(df1,xx)
>>
> [...]
>
> -- 
> Csardi Gabor <csardi at rmki.kfki.hu>    UNIL DGM
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list