[Rd] Bug/Inconsistency in merge() with all.x when first nonmatching column in y is matrix

Russ Hamilton behamilton at google.com
Mon Oct 10 16:46:14 CEST 2016


I've noticed inconsistent behavior with merge() when using all.x=TRUE.
After some digging I found the following test cases:
1) The snippet below doesn't work as expected, as the non-matching
columns of rows in a but not b take the value from the first matching
row instead of being NA:
--- Snip >>>
NUM<-25;
a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM))
b <- data.frame(id=c("e","a","f","y","x"))

b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14
b$nn <- rep("from b", 5)

merge(a,b,by="id",all.x=TRUE)
<<< Snip ---
2) The modified snippet below works as expected:
--- Snip >>>
NUM<-25;
a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM))
b <- data.frame(id=c("e","a","f","y","x"))

b$nn <- rep("from b", 5)
b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14

merge(a,b,by="id",all.x=TRUE)
<<< Snip ---

In src/library/base/R/merge.R:154, I see the following:
--- Snip >>>
for(i in seq_along(y)) {
## do it this way to invoke methods for e.g. factor
if(is.matrix(y[[1]])) y[[1]][zap, ] <- NA
else is.na(y[[i]]) <- zap
}
<<< Snip ---
Changing the '1's in the if statement to 'i's fixes this issue for me, i.e.:
--- Snip >>>
for(i in seq_along(y)) {
## do it this way to invoke methods for e.g. factor
if(is.matrix(y[[i]])) y[[i]][zap, ] <- NA
else is.na(y[[i]]) <- zap
}
<<< Snip ---
I'm actually not sure if the "if statement" is even needed (the "else"
case seems to handle matrices just fine).

--Russ Hamilton



More information about the R-devel mailing list