[R] Systematic treatment of missing values

David Soloveichik dsolov at caltech.edu
Sun May 28 08:19:02 CEST 2006


I am wondering whether there is a well-accepted approach to handling  
missing values (NA's) in a programming language such as R.  For  
example, most functions seem to propagate NA to the output when the  
value of the missing entry could have mattered.  In other words, most  
functions are not willing to "take a stand" on what the missing value  
was.  However, some functions don't seem to do this.  For example,

 > c(1,2,3,NA) %in% c(2,3)
[1] FALSE  TRUE  TRUE FALSE

rather than: FALSE  TRUE  TRUE NA


Also, what is the logic of the following:
 > c(1,2,3,NA) %in% c(2,3,NA)
[1] FALSE  TRUE  TRUE  TRUE

Why is the last output value TRUE?  Why does R claim that the NA on  
the left hand side of %in% is the same as the NA on the right hand  
side of %in%?

Thanks a lot,
David



More information about the R-help mailing list