is.na(v)<-b (was: Re: [R] Beginner's query - segmentation fault)

Wed Oct 8 00:27:45 CEST 2003

I am puzzled by the advice to use is.na(x) <- TRUE instead of x <- NA.

?NA says
     Function `is.na<-' may provide a safer way to set missingness. It
     behaves differently for factors, for example.

However, "MAY provide" is a bit scary, and it doesn't say WHAT the
difference in behaviour is.

I must say that "is.na(x) <- ..." is rather repugnant, because it doesn't
work.  What do I mean?  Well, as the designers of SETL who many years ago
coined the term "sinister function call" to talk about f(...)<-...,
pointed out, if you do
    f(x) <- y
then afterwards you expect
    f(x) == y
to be true.  So let's try it:

    > x <- c(1,NA,3)
    > is.na(x) <- c(FALSE,FALSE,TRUE)
    > x
    [1]  1 NA NA
    > is.na(x)
    [1] FALSE  TRUE  TRUE
                        vvvvv
So I _assigned_ c(FALSE,FALSE,TRUE) to is.na(x),
    but I _got_ c(FALSE,TRUE, TRUE)> instead.
                        ^^^^^
That is not how a well behaved sinister function call should work,
and it's enough to scare someone off is.na()<- forever.

The obvious way to set elements of a variable to missing is ... <- NA.
Wouldn't it be better if that just plain worked?

Can someone give an example of is.na()<- and <-NA working differently
with a factor?  I just tried it:

    > x <- factor(c(3,1,4,1,5,9))
    > y <- x
    > is.na(x) <- x==1
    > y[y==1] <- NA
    > x
    [1] 3    <NA> 4    <NA> 5    9   
    Levels: 1 3 4 5 9
    > y
    [1] 3    <NA> 4    <NA> 5    9   
    Levels: 1 3 4 5 9

Both approaches seem to have given the same answer.  What did I miss?