[R] Subassignments involving NAs in data frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 9 22:09:53 CEST 2005


On Thu, 9 Jun 2005, Thomas Lumley wrote:

> On Thu, 9 Jun 2005, McGehee, Robert wrote:
>
>> I'm seeing some inconsistent behavior when re-assigning values in a data
>> frame. The first assignment turns all of the 0s in my data frame to 2s,
>> the second fails to do so.

But they differ in several ways, so why is this labelled `inconsistent'?
Why not ask `what is the difference'?

The answer to the pertinent question is `the number of items to be 
replaced'.

>>> df1 <- data.frame(a = c(NA, 0, 3, 4))
>>> df2 <- data.frame(a = c(NA, 0, 0, 4))
>>> df1[df1 == 0] <- 2 ## Works
>>> df2[df2 == 0] <- 2
>> Error: NAs are not allowed in subscripted assignments
>
> Hmm. This looks like a bug to me.
>
>> Checking an old news file I see this:
>>    o	Subassignments involving NAs and with a replacement value of
>> 	length > 1 are now disallowed.	(They were handled
>> 	inconsistently in R < 2.0.0, see PR#7210.)  For data frames
>> 	they are disallowed altogether, even for logical matrix indices
>> 	(the only case which used to work).
>> 
>> which leaves me to believe that the assignment for both df1 and df2
>> should fail ("data frame ... disallowed altogether"), however that seems
>> not to be the case, since the example works for df1.
>
> Yes, I think the bug is that it works

It has since been allowed in a few cases to avoid needlessly breaking 
existing code. (The curse of back-compatibility.)

In the first example there is only one value to be replaced, so there is 
no ambiguity in the meaning. In the second the replacement has to be 
replicated to the needed length and so the rules for vectors give the 
error message.

Another case which is allowed is if none of the values are to be replaced: 
that is all the logical indices are FALSE or NA.

>> Also, the
>> vectorized version works as expected (because the replacement value has
>> a length of 1).
>> 
>>> vec1 <- c(NA, 0, 3, 4)
>>> vec2 <- c(NA, 0, 0, 4)
>>> vec1[vec1 == 0] <- 2 ## Works
>>> vec2[vec2 == 0] <- 2 ## Also works
>
> I'm not sure that this is supposed to work, either, but it might be.

Reading help("[") should help alleviate your uncertainty, for this is 
explicitly documented there.

>> Is this behavior for data frames intentional? What's the best
>> alternative to df1[df1 == 0] <- 2 that doesn't fail in situations such
>> as df2? A simple loop over columns?
>
> df2[df2 %in% 0] is the recommended method.

That index is a logical vector of length one.  Try

ind <- df2 == 0
df2[ind & !is.na(ind)] <- 2

but this is really just a loop over columns implemented in [<-.data.frame.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list