[Rd] complex NA's match(), etc: not back-compatible change proposal

Martin Maechler maechler at stat.math.ethz.ch
Wed May 11 10:00:44 CEST 2016


>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Tue, 10 May 2016 16:08:39 +0200 writes:

    > This is an RFC / announcement related to the 2nd part of PR#16885
    > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
    > about  complex NA's.

    > The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the
    > case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0
    > patched in the mean time} triggered some more comprehensive "research".

    > I found that we have had a long-standing inconsistency at least between the
    > documented and the real behavior.  I am claiming that the documented
    > behavior is desirable and hence R's current "real" behavior is bugous, and
    > I am proposing to change it, in R-devel (to be 3.4.0) for now.

After the  "roaring unanimous" assent  (one private msg
      encouraging me to go forward, no dissenting voice, hence an
      "odds ratio" of  +Inf  in favor ;-)

I have now committed my proposal to R-devel (svn rev. 70597) and
some of us will be seeing the effect in package space within a
day or so, in the CRAN checks against R-devel (not for
    bioconductor AFAIK; their checks using R-devel only when it less
    than ca 6 months from release).

It's still worthwhile to discuss the issue, if you come late
to it, notably as ---paraphrasing Dirk on the R-package-devel list---
the release of 3.4.0 is almost a year away, and so now is the
best time to tinker with the API, in other words, consider breaking
rarely used legacy APIs..

Martin


    > In help(match) we have been saying

    > |  Exactly what matches what is to some extent a matter of definition.
    > |  For all types, \code{NA} matches \code{NA} and no other value.
    > |  For real and complex values, \code{NaN} values are regarded
    > |  as matching any other \code{NaN} value, but not matching \code{NA}.

    > for at least 10 years.  But we don't do that at all in the
    > complex case (and AFAIK never got a bug report about it).

    > Also, e.g., print(.) or format(.) do simply use  "NA" for all
    > the different complex NA-containing numbers, where OTOH,
    > non-NA NaN's { <=>  !is.nan(z) & is.na(z) }
    > in format() or print() do show the NaN in real and/or imaginary
    > parts; for an example, look at the "format" column of the matrix
    > below, after 'print(cbind' ...

    > The current match()---and duplicated(), unique() which are based on the same
    > C code---*do* distinguish almost all complex NA / NaN's which is
    > NOT according to documentation. I have found that this is just because of 
    > of our hashing function for the complex case, chash() in R/src/main/unique.c,
    > is bogous in the sense that it is not compatible with the above documentation
    > and also not with the cequal() function (in the same file uniqu.c) for checking
    > equality of complex numbers.

    > As I have found,, a *simplified* version of the chash() function
    > to make it compatible with cequal() does solve all the problems I've
    > indicated,  and the current plan is to commit that change --- after some
    > discussion time, here on R-devel ---  to the code base.

    > My change passes  'make check-all' fine, but I'm 100% sure that there will
    > be effects in package-space. ... one reason for this posting.

    > As mentioned above, note that the chash() function has been in
    > use for all three functions
    > match()
    > duplicated()
    > unique()
    > and the change will affect all three --- but just for the case of complex
    > vectors with NA or NaN's.

    > To show more, a small R session -- using my version of R-devel
    > == the proposition: 
    > The R script ('complex-NA-short.R') for (a bit more than) the
    > session is attached {{you can attach  text/plain easily}}:

    >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
    >> ##           --- = NA_real_  but that does not exist e.g., in R 2.3.1
    >> ##                   similarly,  '1L', '2L', .. do not exist e.g., in R 2.3.1
    >> (z <- z[is.na(z)])
    > [1]       NA NaN+  0i       NA NaN+  1i       NA       NA       NA       NA
    > [9]   0+NaNi   1+NaNi       NA NaN+NaNi
    >> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
    > +     r <- matrix( , length(x), length(y))
    > +     for(i in seq(along=x))
    > +         for(j in seq(along=y))
    > +             r[i,j] <- identical(z[i], z[j], ...)
    > +     r
    > + }
    >> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ:
    >> ## a version that works in older versions of R, where identical() had fewer arguments!
    >> outerID.picky <- function(x,y) {
    > +     nF <- length(formals(identical)) - 2
    > +     do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
    > + }
    >> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is  a wild guess
    >> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
                             
    > [1,] | . . . . . . . . . . .
    > [2,] . | . . . . . . . . . .
    > [3,] . . | . . . . . . . . .
    > [4,] . . . | . . . . . . . .
    > [5,] . . . . | . . . . . . .
    > [6,] . . . . . | . . . . . .
    > [7,] . . . . . . | . . . . .
    > [8,] . . . . . . . | . . . .
    > [9,] . . . . . . . . | . . .
    > [10,] . . . . . . . . . | . .
    > [11,] . . . . . . . . . . | .
    > [12,] . . . . . . . . . . . |
    >> try(# for older R versions
    > + stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1))
    > + )
    >> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_
    > [1] 1 2 1 2 1 1 1 1 2 2 1 2
    >> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
    >> print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
    > format   Re   Im   mz
    > [1,]       NA <NA> 0    1 
    > [2,] NaN+  0i NaN  0    2 
    > [3,]       NA <NA> 1    1 
    > [4,] NaN+  1i NaN  1    2 
    > [5,]       NA 0    <NA> 1 
    > [6,]       NA 1    <NA> 1 
    > [7,]       NA <NA> <NA> 1 
    > [8,]       NA NaN  <NA> 1 
    > [9,]   0+NaNi 0    NaN  2 
    > [10,]   1+NaNi 1    NaN  2 
    > [11,]       NA <NA> NaN  1 
    > [12,] NaN+NaNi NaN  NaN  2 
    >> 
    > -------------------------------
    > Note that 'mz <- match(z, z)' and hence the last column of the matrix above
    > are very different in current R, 
    > distinguishing most kinds of NA / NaN  against the documentation (and the
    > real/numeric case).

    > Martin Maechler
    > R Core Team


    > ### Basically a shortened version of  the PR#16885 -- complex part b)
    > ### of  R/tests/reg-tests-1c.R

    > ## b) complex 'x' with different kinds of NaN
    > x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
    > ##           --- = NA_real_  but that does not exist e.g., in R 2.3.1
    > ##                   similarly,  '1L', '2L', .. do not exist e.g., in R 2.3.1
    > (z <- z[is.na(z)])
    > outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
    > r <- matrix( , length(x), length(y))
    > for(i in seq(along=x))
    > for(j in seq(along=y))
    > r[i,j] <- identical(z[i], z[j], ...)
    > r
    > }
    > ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ:
    > ## a version that works in older versions of R, where identical() had fewer arguments!
    > outerID.picky <- function(x,y) {
    > nF <- length(formals(identical)) - 2
    > do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
    > }
    > oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is  a wild guess
    > symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
    > try(# for older R versions
    > stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1))
    > )
    > (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_
    > zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
    > print(cbind(format = format(z), t(zRI), mz), quote=FALSE)

    > ## compute  match(z[i], z) , for  i = 1,2,..,12  :
    > (m1z <- sapply(z, match, table = z))
    > ## 1 2 1 2 2 2 1 2 2 2 1 2   # R 1.2.3  (2001-04-26)
    > ## 1 2 3 4 1 3 7 8 2 4 8 7   # R 1.4.1  (2002-01-30)
    > ## 1 2 3 4 1 3 7 8 2 4 8 12  # R 1.5.1  (2002-06-17)
    > ## 1 2 3 4 1 3 7 8 2 4 8 12  # R 1.8.1  (2003-11-21)
    > ## 1 2 3 4 1 3 7 8 2 4 8 12  # R 2.0.1  (2004-11-15)
    > ## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.1.1  (2005-06-20)
    > ## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.3.1  (2006-06-01)
    > ## 1 2 3 4 1 3 7 8 2 4 8 12  # R 2.5.1  (2007-06-27)
    > ## 1 2 3 4 1 3 7 4 2 4 4 12  # R 2.10.1 (2009-12-14)
    > ## 1 2 3 4 1 3 7 4 2 4 4 12  # R 3.1.1  (2014-07-10)
    > ## 1 2 3 4 1 3 7 4 2 4 4 12  # R 3.2.5 -- and 3.3.0 patched
    > ## 1 2 1 2 1 1 1 1 2 2 1 2   # <<-- Martin's R-devel and proposed future R

    > if(!exists("anyNA", mode="function")) anyNA <- function(x) any(is.na(x))
    > stopifnot(apply(zRI, 2, anyNA)) # *all* are  NA *or* NaN (or both)
    > is.NA <- function(.) is.na(.) & !is.nan(.)
    > (iNaN <- apply(zRI, 2, function(.) any(is.nan(.))))
    > (iNA <-  apply(zRI, 2, function(.) any(is.NA (.)))) # has non-NaN NA's
    > ## In Martin's version of R-devel :
    > stopifnot(identical(m1z == 1, iNA),
    > identical(m1z == 2, !iNA))
    > ## m1z uses match(x, *) with length(x) == 1 and failed in R 3.3.0
    > stopifnot(identical(m1z, mz))
    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list