[Rd] unique.matrix issue [Was: Anomaly with unique and match]

jochen laubrock jochen.laubrock at gmail.com
Mon Mar 28 16:54:47 CEST 2011


Still, from a user's perspective this behavior is somewhat irritating. Wouldn't it be better to rewrite unique.matrix to use formatC or sprintf instead of as.character, on which paste in line 9 implicitly relies, at least in R version 2.12.2  (2011-02-25)?

For example, use

temp <- apply(x, MARGIN, formatC, digits=324, format="f")

instead of

temp <- apply(x, MARGIN, function(x) paste(x, collapse = "\r"))

Don't know whether this affects performance, though.

Sorry to chime in late. 
Cheers, 
Jochen


> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8


On Mar 9, 2011, at 20:11 , Simon Urbanek wrote:

> match() is a red herring here -- it is really a very specific thing that has to do with the fact that you're running unique() on a matrix. Also it's much easier to reproduce:
> 
>> x=c(1,1+0.2e-15)
>> x
> [1] 1 1
>> sprintf("%a",x)
> [1] "0x1p+0"               "0x1.0000000000001p+0"
>> unique(x)
> [1] 1 1
>> sprintf("%a",unique(x))
> [1] "0x1p+0"               "0x1.0000000000001p+0"
>> unique(matrix(x,2))
>     [,1]
> [1,]    1
> 
> and this comes from the fact that unique.matrix uses string representation since it has to take into account all values of a row/column so it pastes all values into one string, but for the two numbers that is the same:
>> as.character(x)
> [1] "1" "1"
> 
> Cheers,
> Simon
> 
> 
> On Mar 9, 2011, at 9:48 AM, Terry Therneau wrote:
> 
>> I stumbled onto this working on an update to coxph.  The last 6 lines
>> below are the question, the rest create a test data set.
>> 
>> tmt585% R
>> R version 2.12.2 (2011-02-25)
>> Copyright (C) 2011 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>> # Lines of code from survival/tests/singtest.R
>>> library(survival)
>> Loading required package: splines
>>> test1 <- data.frame(time=  c(4, 3,1,1,2,2,3),
>> +     status=c(1,NA,1,0,1,1,0),
>> +     x=     c(0, 2,1,1,1,0,0))
>>> 
>>> temp <- rep(0:3, rep(7,4))
>>> 
>>> stest <- data.frame(start  = 10*temp,
>> +     stop   = 10*temp + test1$time,
>> +     status = rep(test1$status,4),
>> +     x      = c(test1$x+ 1:7, rep(test1$x,3)),
>> +     epoch  = rep(1:4, rep(7,4)))
>>> 
>>> fit1 <- coxph(Surv(start, stop, status) ~ x * factor(epoch), stest)
>> 
>> ## New lines
>>> temp1 <- fit1$linear.predictor
>>> temp2 <- as.matrix(temp1)
>>> match(temp1, unique(temp1))
>> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6
>>> match(temp2, unique(temp2))
>> [1]  1  2  3  4  4  5  6  7  7  7  6  6  6 NA NA NA  6  6  6  8  8  8
>> 6  6
>> 
>> -----------------------
>> 
>> I've solved it for my code by not calling match on a 1 column vector.  
>> In general, however, should I be using some other paradym for this "map
>> to unique" operation?  For example match(as.character(x),
>> unique(as.character(x)) ?
>> 
>> Terry T
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list