[Rd] data frame method for as.table()

Marc Schwartz marc_schwartz at me.com
Mon May 23 19:50:56 CEST 2016


> On May 23, 2016, at 11:46 AM, Ernest Adrogué <eac at openmailbox.org> wrote:
> 
> Hello,
> 
> Currently it's possible to convert an object of class table to a data frame
> with as.data.frame.table(), but there's no ready-made function, AFAIK, to do
> the reverse operation, i.e. conversion of a data frame to a table.
> 
> Do you think it would be a good idea to add a data.frame method to
> as.table(), to allow such conversions?
> 
> The idea is that if `x' is a table and `y <- as.data.frame(x)', then the
> object returned by `as.table(y)' should be equal to `x'.
> 
> Below is a proof of concept
> 
> as.table.data.frame <- function(x, ..., response) {
>    if (missing(response))
>        resp <- ncol(x)
>    else {
>        resp <- match(response[1L], names(x))
>        if (is.na(resp))
>            stop('not found: ', response[1L])
>    }
>    x[-resp] <- lapply(x[-resp], as.factor)
>    if (any(do.call(table, x[-resp]) > 1L))
>        stop('repeated frequency value')
>    dn <- lapply(x[-resp], function(y) {
>        if (any(is.na(y))) c(levels(y), NA) else levels(y)
>    })
>    ind <- mapply(function(val, lev) match(val, lev), x[-resp], dn)
>    out <- array(dim=unlist(lapply(dn, length)), dimnames=dn)
>    out[ind] <- x[[resp]]
>    as.table(out)
> }
> 
> and a simple usage example:
> 
>> (y <- table(foo=c('a','a',NA,'b','b'), useNA='always'))
> foo
>   a    b <NA> 
>   2    2    1 
>> (yy <- as.data.frame(y))
>   foo Freq
> 1    a    2
> 2    b    2
> 3 <NA>    1
>> as.table(yy)
> foo
>   a    b <NA> 
>   2    2    1 
>> 
> 
> Any thoughts?

Hi,

I have not tried an exhaustive set of examples, but I believe that ?xtabs will get you most of the way there, with the exception of NA handling, which would require either nuanced use of ?addNA or preprocessing factors in the input data frame to possibly add NA as a factor level where needed.

BTW, from ?as.data.frame.table, see the reference there to xtabs() at the end:

"The as.data.frame method for objects inheriting from class "table" can be used to convert the array-based representation of a contingency table to a data frame containing the classifying factors and the corresponding entries (the latter as component named by responseName). This is the inverse of xtabs."


For example, from ?xtabs, using the ?UCBAdmissions dataset, which is a 3 dimension table:

> str(UCBAdmissions)
 table [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
 - attr(*, "dimnames")=List of 3
  ..$ Admit : chr [1:2] "Admitted" "Rejected"
  ..$ Gender: chr [1:2] "Male" "Female"
  ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...

DF <- as.data.frame(UCBAdmissions)

> str(DF)
'data.frame':	24 obs. of  4 variables:
 $ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 2 1 2 1 2 1 2 1 2 ...
 $ Gender: Factor w/ 2 levels "Male","Female": 1 1 2 2 1 1 2 2 1 1 ...
 $ Dept  : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ Freq  : num  512 313 89 19 353 207 17 8 120 205 ...


# Using your function
DF.Table <- as.table.data.frame(DF)

> str(DF.Table)
 table [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
 - attr(*, "dimnames")=List of 3
  ..$ Admit : chr [1:2] "Admitted" "Rejected"
  ..$ Gender: chr [1:2] "Male" "Female"
  ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...


# Using xtabs()
DF.xtabs <- xtabs(Freq ~ ., data = DF)

> str(DF.xtabs)
 xtabs [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
 - attr(*, "dimnames")=List of 3
  ..$ Admit : chr [1:2] "Admitted" "Rejected"
  ..$ Gender: chr [1:2] "Male" "Female"
  ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...
 - attr(*, "class")= chr [1:2] "xtabs" "table"
 - attr(*, "call")= language xtabs(formula = Freq ~ ., data = DF)


Note that DF.xtabs has additional attributes set as a result of the use of xtabs().

In the example that you provided above, you would need to use something along the lines of:

> xtabs(Freq ~ addNA(foo), data = yy)
addNA(foo)
   a    b <NA> 
   2    2    1 

so that xtabs() includes the NA level, or for a larger data frame with a lot of columns, pre-process the columns so that NA is included in the factor levels where you desire.

That latter issue with NA's and xtabs() BTW, has bitten a lot of people over the years, where the recommendation to use:

> xtabs(Freq ~ foo, data = yy, exclude = NULL, na.action = na.pass)
foo
a b 
2 2 

does not actually work as believed.

Regards,

Marc Schwartz



More information about the R-devel mailing list