[R] Re: Excluding levels in table and xtabs

Michael Friendly friendly at yorku.ca
Thu Dec 12 19:01:03 CET 2002


Having looked over the replies and examined the code, I can't
see any reason for table (and xtabs) to avoid honoring the
exclude= argument for factors.  There are often reasons for wanting
to exclude certain levels, even non-missing in making a table.

In my application, John Fox suggested that I could circumvent
the problem by reading in the .csv file with na.strings="".
However, it was only for making tables that I wanted to exclude
the "" categories.

The change to table() to have it honor the exclude option for
factors is quite straight-forward.  I wonder if the R team
will consider placing this on its list.  (revised version below)

More generally, in working with tables I often find the need
to collapse or reorder the levels of some dimensions of an
n-way table.  I've written a collapse.table to do the first,
e.g.,

sex <- c("Male", "Female")
age <- letters[1:6]
education <- c("low", 'med', 'high')
data <- expand.grid(sex=sex, age=age, education=education) 
data <- cbind(data, rpois(36, 100))
    # collapse age to 3 levels
t2 <- collapse.table(t1, age=c("A", "A", "B", "B", "C", "C"))
t3 <- collapse.table(t1, age=c("A", "A", "B", "B", "C", "C"), 
    education=c("low", "low", "high"))

and it's not too hard to do the second.  However, I wonder if some
more general and convenient tools for working with tables are
available somewhere I've missed.  

For example, for mosaicplots
it is often crucial be able to treat table variables as
ordered factors, where the ordering is that which shows the
pattern of association, not the default.   For a data frame,
this can be done with

subset$Skin.Colour <- factor(subset$Skin.Colour, levels=c("White",
"Brown", "Other", "Black"))

but it's more unweildy with a table object.

-Michael

------- table.R ------
#  modified to respect the exclude argument for factors
#     use exclude=NULL for former behavior for factors (or change
default)

table <- function (..., exclude = c(NA, NaN),
   dnn = list.names(...), deparse.level = 1)
{
    list.names <- function(...) {
        l <- as.list(substitute(list(...)))[-1]
        nm <- names(l)
        fixup <- if (is.null(nm))
            seq(along = l)
        else nm == ""
        dep <- sapply(l[fixup], function(x)
        switch (deparse.level + 1,
        "",
        if (is.symbol(x)) as.character(x) else "",
        deparse(x)[1]
        )
        )
        if (is.null(nm))
            dep
        else {
            nm[fixup] <- dep
            nm
        }
    }

    args <- list(...)
    if (length(args) == 0)
    stop("nothing to tabulate")
    if (length(args) == 1 && is.list(args[[1]])) {
    args <- args[[1]]
    if (length(dnn) != length(args))
        dnn <- if (!is.null(argn <- names(args)))
             argn
        else
                 paste(dnn[1],1:length(args),sep='.')
    }
    bin <- 0
    lens <- NULL
    dims <- integer(0)
    pd <- 1
    dn <- NULL
    for (a in args) {
    if (is.null(lens)) lens <- length(a)
    else if (length(a) != lens)
        stop("all arguments must have the same length")
# MF: make exclude work for factors too
#    if (is.factor(a))
#        cat <- a
#    else
        cat <- factor(a, exclude = exclude)
    nl <- length(l <- levels(cat))
    dims <- c(dims, nl)
    dn <- c(dn, list(l))
    ## requiring   all(unique(as.integer(cat)) == 1:nlevels(cat))  :
    bin <- bin + pd * (as.integer(cat) - 1)
    pd <- pd * nl
    }
    names(dn) <- dnn
    bin <- bin[!is.na(bin)]
    if (length(bin)) bin <- bin + 1 # otherwise, that makes bin NA
    y <- array(tabulate(bin, pd), dims, dimnames = dn)
    class(y) <- "table"
    y
}


-- 
Michael Friendly              friendly at yorku.ca
York University               http://www.math.yorku.ca/SCS/friendly.html
Psychology Department
4700 Keele Street             Tel:  (416) 736-5115 x66249
Toronto, Ontario, M3J 1P3     Fax:  (416) 736-5814




More information about the R-help mailing list