[Rd] Assigning to factor[[i]]

Sun Mar 15 16:49:38 CET 2009

I am a bit confused about the semantics of classes, [, and [[.

For at least some important built-in classes (factors and dates), both
the getter and the setter methods of [ operate on the class, but
though the getter method of [[ operates on the class, the setter
method operates on the underlying vector.  Is this behavior
documented? (I haven't found any documentation of it.) Is it
intentional?  (i.e. is it a bug or a feature?)  There are also cases
where invalid assignments don't signal an error.

A simple example:

> fact <- factor(2,levels=2:4)        # master copy
> f0 <- fact; f0; dput(f0)
[1] 2
Levels: 2 3 4
structure(1L, .Label = c("2", "3", "4"), class = "factor")

> f0 <- fact; f0[1] <- 3; f0; dput(f0)     # use [ setter
[1] 3
Levels: 2 3 4
structure(2L, .Label = c("2", "3", "4"), class = "factor")

> f0 <- fact; f0[[1]] <- 3L; f0; dput(f0)   # use [[ setter
[1] 4                                                    # ? didn't
convert 3 to factor
Levels: 2 3 4
structure(3L, .Label = c("2", "3", "4"), class = "factor")   #
modified underlying vector
> f0[1]
[1] 4
Levels: 2 3 4
# but result is a valid factor

> f0 <- fact; f0[[1]] <- 3; f0; dput(f0)   # use [[ setter
[1] 4
Levels: 2 3 4
structure(3, .Label = c("2", "3", "4"), class = "factor")  # didn't
convert to 3L
> f0[1]
Error in class(y) <- oldClass(x) :
  adding class "factor" to an invalid object

I suppose f0[1] and f0[[1]] fail here because the underlying vector
must be integer and not numeric? If so, why didn't assigning to
f0[[1]] cause an error? And why didn't printing f0 cause the same
error?

Here are some more examples. Consider

fac <- factor(c("b","a","c"),levels=c("b","c","a"))

f <- fac; f[1] <- "c"; dput(f)
# structure(c(2L, 3L, 2L), .Label = c("b", "c", "a"), class = "factor")
#### OK, implicit conversion of "c" to factor(c) was performed

f <- fac; f[1] <- 25; dput(f)
# Warning message:
# In `[<-.factor`(`*tmp*`, 1, value = 25) :
#   invalid factor level, NAs generated
# structure(c(NA, 3L, 2L), .Label = c("b", "c", "a"), class = "factor")
#### OK, error given for invalid value, which becomes an NA
#### Same thing happens for f[1]<-"foo"

So far, so good.  Now compare to what happens with fac[[...]] <- ...

f <- fac; f[[1]] <- 25; dput(f)
# structure(c(25, 3, 2), .Label = c("b", "c", "a"), class = "factor")
#### No error given, but invalid factor generated

f <- fac; f[[1]] <- "c"; dput(f)
# structure(c("c", "3", "2"), .Label = c("b", "c", "a"), class = "factor")
#### No conversion performed; no error given; invalid factor generated

f
# [1] <NA> <NA> <NA>
# Levels: b c a
#### Prints as though it were factor(c(NA,NA,NA)) with no warning/error

f[]
# Error in class(y) <- oldClass(x) :
#  adding class "factor" to an invalid object
#### But f[] gives an error
#### Same error with f[1] and f[[1]]

Another interesting case is f[1] <- list(NULL) -- which correctly
gives an error -- versus f[[1]] <- list(), which gives no error but
results in an f which is not a factor at all:

f <- fac; f[[1]]<-list(); class(f); dput(f)
[1] "list"
list(list(), 3L, 2L)

I can see that being able to modify the underlying vector of a classed
object directly would be very valuable functionality, but there is an
assymmetry here: f[[1]]<- modifies the underlying vector, but f[[1]]
accesses the classed vector.  Presumably you need to do
unclass(f)[[1]] to see the underlying value.  But on the other hand,
unclass doesn't have a setter (`unclass<-`), so you can't say
unclass(f)[[1]] <- ...

I have not been able to find documentation of all this in the R
Language Definition or in the man page for [/[[, but perhaps I'm
looking in the wrong place?

            -s