[R] Characters vectors, NA's and "" in merges

David Kane <David Kane a296180 at mica.fmr.com
Wed Sep 26 14:10:32 CEST 2001

I often use merge with dataframes that contain character vectors which have
elements that are sometimes "NA" (meaning the string NA, not the same thing,
obviously, as NA in a numeric or factor vector). For example, the stock ticker
for Nabisco was "NA". Unfortunately (for me), it seems like merge insists on
inserting "NA" for missing values. My question: Is there some way around this?
Here is a simple example:

> version
platform sparc-sun-solaris2.6
arch     sparc               
os       solaris2.6          
system   sparc, solaris2.6   
major    1                   
minor    3.0                 
year     2001                
month    06                  
day      22                  
language R                   

> a <- data.frame(x = 1:4)
> b <- data.frame(x = 1:3, y = c("NA", "a", "b"))
> merge(a, b, all.x = TRUE)
  x  y
1 1 NA
2 2  a
3 3  b
4 4 NA

Rows 1:3 are what I expect them to be. Row 4 is "wrong" in the sense that
dataframe b did not contain a row for x = 4. Of course, there is a sense that
*any* value, including "", that is placed in row 4 is potentially
misleading. Perhaps I am misunderstanding the meaning of "NA" in a character
vector (i.e., I am not allowed to have "real" values that are that string). 

If there were some way (an "nomatch" argument?) that the user could specify
what missing values are used for character strings, then I would be
fine. Again, I suspect that my real problem is not understanding how to specify
"NA" -- meaning Nabisco's ticker symbol -- in a character vector.

Any suggestions would be much appreciated.

Dave Kane
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list