[R] generalizing expand.table: table -> data.frame

Marc Schwartz marc_schwartz at comcast.net
Tue Jan 20 19:05:51 CET 2009


on 01/20/2009 10:38 AM Michael Friendly wrote:
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3064.html
> a method was given for converting a frequency table to an expanded data
> frame representing each
> observation as a set of factors.  A slightly modified version was later
> included in the NCStats package,
> only on http://rforge.net/ (and it has too many dependencies to be useful).
> 
> I've tried to make it more general, allowing an input data frame in
> frequency form, and where
> the frequency variable is not named "Freq".  This is my working version:
> 
> __begin__ expand.table.R
> expand.table <- function (x, var.names = NULL, freq="Freq", ...)
> {
> #  allow: a table object, or a data frame in frequency form
>   if(inherits(x,"table")) {
>     x <- as.data.frame.table(x)
>   }
> ##  This fails:
> #   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,freq]), ],
> simplify = FALSE)
> #   df <- subset(do.call("rbind", df), select = -freq)
> 
> #  This works, when the frequency variable is named Freq
>   df <- sapply(1:nrow(x), function(i) x[rep(i, each = x[i,"Freq"]), ],
> simplify = FALSE)
>   df <- subset(do.call("rbind", df), select = -Freq)
> 
>   for (i in 1:ncol(df)) {
>       df[[i]] <- type.convert(as.character(df[[i]]), ...)
>   }
>   rownames(df) <- NULL
>   if (!is.null(var.names)) {
>       if (length(var.names) < dim(df)[2])
>           stop("Too few var.names given.")
>       else if (length(var.names) > dim(df)[2])
>           stop("Too many var.names given.")
>       else names(df) <- var.names
>   }
>   df
> }
> __end__   expand.table.R
> 
> Thus for the following table
> 
> library(vcd)
> art <- xtabs(~Treatment + Improved, data = Arthritis)
> 
> 
>> art
>         Improved
> Treatment None Some Marked
>  Placebo   29    7      7
>  Treated   13    7     21
> 
> expand.table (above) gives a data frame of sum(art)=84 observations,
> with factors
> Treatment and Improved.
>> artdf <- expand.table(art)
>> str(artdf)
> 'data.frame':   84 obs. of  2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
> 
> I've generalized this so it works with data frames in frequency form,
> 
>> as.data.frame(art)
>  Treatment Improved Freq
> 1   Placebo     None   29
> 2   Treated     None   13
> 3   Placebo     Some    7
> 4   Treated     Some    7
> 5   Placebo   Marked    7
> 6   Treated   Marked   21
> 
>> art.df2 <- expand.table(as.data.frame(art))
>> str(art.df2)
> 'data.frame':   84 obs. of  2 variables:
> $ Treatment: Factor w/ 2 levels "Placebo","Treated": 1 1 1 1 1 1 1 1 1 1
> ...
> $ Improved : Factor w/ 3 levels "Marked","None",..: 2 2 2 2 2 2 2 2 2 2 ...
>>
> 
> But--- here's the rub --- when the Freq variable in a data frame is
> called something other than
> "Freq", as in this example,
> 
>> GSS
>     sex party count
> 1 female   dem   279
> 2   male   dem   165
> 3 female indep    73
> 4   male indep    47
> 5 female   rep   225
> 6   male   rep   191
> 
> all the changes I've tried, using the freq= argument in expand.table()
> fail in various ways.
> 
> Can someone help?

Hi Michael,

I think that the following modifications to my original code, also
incorporating the changes made in the NCstats package should work.


expand.dft <- function(x, var.names = NULL, freq = "Freq", ...)
{
  #  allow: a table object, or a data frame in frequency form
  if(inherits(x, "table"))
    x <- as.data.frame.table(x, responseName = freq)

  freq.col <- which(colnames(x) == freq)
  if (length(freq.col) == 0)
      stop(paste(sQuote("freq"), "not found in column names"))

  DF <- sapply(1:nrow(x),
               function(i) x[rep(i, each = x[i, freq.col]), ],
               simplify = FALSE)

  DF <- do.call("rbind", DF)[, -freq.col]

  for (i in 1:ncol(DF))
  {
    DF[[i]] <- type.convert(as.character(DF[[i]]), ...)

  }

  rownames(DF) <- NULL

  if (!is.null(var.names))
  {
    if (length(var.names) < dim(DF)[2])
    {
      stop(paste("Too few", sQuote("var.names"), "given."))
    } else if (length(var.names) > dim(DF)[2]) {
      stop(paste("Too many", sQuote("var.names"), "given."))
    } else {
      names(DF) <- var.names
    }
  }

  DF
}



> art
         Improved
Treatment None Some Marked
  Placebo   29    7      7
  Treated   13    7     21


> head(expand.dft(art), 10)
   Treatment Improved
1    Placebo     None
2    Placebo     None
3    Placebo     None
4    Placebo     None
5    Placebo     None
6    Placebo     None
7    Placebo     None
8    Placebo     None
9    Placebo     None
10   Placebo     None



art.dft <- as.data.frame.table(art)

> art.dft
  Treatment Improved Freq
1   Placebo     None   29
2   Treated     None   13
3   Placebo     Some    7
4   Treated     Some    7
5   Placebo   Marked    7
6   Treated   Marked   21

names(art.dft)[3] <- "count"

> art.dft
  Treatment Improved count
1   Placebo     None    29
2   Treated     None    13
3   Placebo     Some     7
4   Treated     Some     7
5   Placebo   Marked     7
6   Treated   Marked    21


> head(expand.dft(art.dft, freq = "count"), 10)
   Treatment Improved
1    Placebo     None
2    Placebo     None
3    Placebo     None
4    Placebo     None
5    Placebo     None
6    Placebo     None
7    Placebo     None
8    Placebo     None
9    Placebo     None
10   Placebo     None


HTH,

Marc Schwartz




More information about the R-help mailing list