[R] Question about a perceived irregularity in R syntax

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jul 23 14:52:18 CEST 2010


On 23/07/2010 7:14 AM, Duncan Murdoch wrote:
> Nordlund, Dan (DSHS/RDA) wrote:
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> >> project.org] On Behalf Of Peter Dalgaard
> >> Sent: Thursday, July 22, 2010 3:13 PM
> >> To: Pat Schmitz
> >> Cc: r-help at r-project.org
> >> Subject: Re: [R] Question about a perceived irregularity in R syntax
> >>
> >> Pat Schmitz wrote:
> >>     
> >>> Both vector query's can select the values from the data.frame as
> >>>       
> >> written,
> >>     
> >>> however in the first form assigning a value to said selected numbers
> >>>       
> >> fails.
> >>     
> >>>  Can you explain the reason this fails?
> >>>
> >>> dat <- data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
> >>>
> >>> dat$Value[dat$Value == "NA"] <- 1 #Why does this  fails to work,
> >>> dat$Value[dat$Value %in% NA] <- 1 #While this does work?
> >>>
> >>>
> >>> #Particularly when str() results in an equivalent class
> >>> dat <- data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
> >>> str(dat$Value[dat$Value %in% NA])
> >>> str(dat$Value[dat$Value == "NA"])
> >>>       
> >> 1. NA and "NA" are very different things
> >> 2. checkout is.na() and its help page
> >>
> >>
> >>     
> >
> > I also would have suggested is.na to do the replacement.  What surprised me was that 
> >
> > dat$Value[dat$Value %in% NA] <- 1 
> >
> > actually worked.  I guess I always assumed that if 
> >
> >   
> >> NA == NA
> >>     
> > [1] NA
> >
> > then an attempt to compare NA to elements in a vector would also return NA, but not so.
> >
> >   
> >> NA %in% c(1,NA,3)
> >>     
> > [1] TRUE
> >
> >
> > Learned something new today,
>
> I suspect that's not intentional, though I'm not sure it should be 
> fixed.  According to the usual convention the result should be a logical NA.

Oops, not true. The behaviour is clearly documented in ?match:

Exactly what matches what is to some extent a matter of
definition. For all types, ‘NA’ matches ‘NA’ and no other
value. For real and complex values, ‘NaN’ values are regarded
as matching any other ‘NaN’ value, but not matching ‘NA’.

Thanks to Brian Ripley (the author of that paragraph) for pointing this 
out to me. Not sure how I missed it on my first reading, but the fact 
that it preceded my morning coffee might be a contributing factor.

Duncan Murdoch



More information about the R-help mailing list