[Rd] unlist on nested lists of factors (PR#12572)

davison at stats.ox.ac.uk davison at stats.ox.ac.uk
Wed Aug 20 15:25:10 CEST 2008


Here is a description and a proposed solution for a bug in unlist().

I've used version 2.7.2 RC (2008-08-18 r46382) to look at this, under
linux.

unlist(recursive=TRUE) incorrectly returns a factor with zero levels
when passed either a nested list of factors, or a data frame
containing only factor columns. You can't print() the result.

x <- list(list(v=factor("a")))
str(unlist(x))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(x)) : 'object' does not have valid levels() 
y <- list(data.frame(v=factor("a")))
str(unlist(y))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(y)) : 'object' does not have valid levels()

unlist is defined as

unlist <- function(x, recursive=TRUE, use.names=TRUE)
{
    if(.Internal(islistfactor(x, recursive))) {
        lv <- unique(.Internal(unlist(lapply(x, levels), recursive, FALSE)))
        nm <- if(use.names) names(.Internal(unlist(x, recursive, use.names)))
        res <- .Internal(unlist(lapply(x, as.character), recursive, FALSE))
        res <- match(res, lv)
        ## we cannot make this ordered as level set may have been changed
        structure(res, levels=lv, names=nm, class="factor")
    } else .Internal(unlist(x, recursive, use.names))
}

The error occurs because, in both cases, at the C level, islistfactor
recurses and finds that all elements are factors, and the if test
condition is TRUE. However, the two instances of lapply do not
recurse, and return inappropriate results. A possible solution is to
replace both instances of lapply with rapply. This results in
appropriate factor answers in this case:

str(unlist(x))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"
str(unlist(y))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"

An alternative is to not return a factor result, by altering the if
test condition so that nested lists of factors, and lists of
factor-only data frames, fail.


Dan

-- 
www.stats.ox.ac.uk/~davison



More information about the R-devel mailing list