[R] Apply or Tapply to Build Set of Tables

Dennis Murphy djmuser at gmail.com
Tue May 24 05:03:09 CEST 2011


Hi:

Here's one way to do the pairwise tables. I'm restricting attention to
the variables with only a few levels, but the idea should be clear
enough.

# Put the variable names into a vector
vars <- names(infert)[c(1, 3:6)]

# Use expand.grid() to generate all pairs of variables
# It's important to keep these as character strings
tvars <- expand.grid(x = vars[-length(vars)],
                     y = vars[-1], stringsAsFactors = FALSE)

library(plyr)
# Function to return the value of the chi-square statistic
# and its p-value. Inputs x and y are character strings of variable names
# get(x) and get(y) pull in the data associated with those variables
x2fun <- function(x, y) {
    res <- with(infert, chisq.test(get(x), get(y)))
    data.frame(stat = res$statistic, pval = res$p.value)
   }

# Apply the function to each row of tvars, outputting
# a data frame
mdply(tvars, x2fun)

> mdply(tvars, x2fun)
           x           y         stat          pval
1  education      parity 1.058180e+02  3.708990e-18
2     parity      parity 1.240000e+03 5.270109e-246
3    induced      parity 5.969764e+01  4.134443e-09
4       case      parity 6.036266e-02  9.999534e-01
5  education     induced 1.653059e+01  2.383898e-03
6     parity     induced 5.969764e+01  4.134443e-09
7    induced     induced 4.960000e+02 4.910976e-106
8       case     induced 7.322983e-02  9.640473e-01
9  education        case 2.289618e-03  9.988558e-01
10    parity        case 6.036266e-02  9.999534e-01
11   induced        case 7.322983e-02  9.640473e-01
12      case        case 2.435293e+02  6.686097e-55
13 education spontaneous 3.626057e+00  4.589717e-01
14    parity spontaneous 5.083091e+01  1.876475e-07
15   induced spontaneous 1.819802e+01  1.128831e-03
16      case spontaneous 3.286172e+01  7.314205e-08
Warning messages:
1: In chisq.test(get(x), get(y)) :
  Chi-squared approximation may be incorrect
2: In chisq.test(get(x), get(y)) :
  Chi-squared approximation may be incorrect
3: In chisq.test(get(x), get(y)) :
  Chi-squared approximation may be incorrect
4: In chisq.test(get(x), get(y)) :
  Chi-squared approximation may be incorrect

The warnings have to do with cell sizes < 5 in the bivariate tables.
Also watch out for Simpson's paradox :)

HTH,
Dennis

On Mon, May 23, 2011 at 5:31 PM, Sparks, John James <jspark4 at uic.edu> wrote:
> Dear R Helpers,
>
> First, I apologize for asking for help on the first of my topics.  I have
> been looking at the posts and pages for apply, tapply etc, and I know that
> the solution to this must be ridiculously easy, but I just can't seem to
> get my brain around it.  If I want to produce a set of tables for all the
> variables in my data, how can I do that without having to type them into
> the table command one by one.  So, I would like to use (t? s? r?)apply to
> use one command instead of the following set of table commands:
>
> data(infert, package = "datasets")
> attach(infert)
>
> table.education<-table(education)
> table.age<-table(age)
> table.parity<-table(parity)
> etc.
>
>
> To make matters worse, what I subsequently need is the chi-square for each
> and all of the pairs of variables.  Such as:
>
> chi.education.age<-chisq.test(table(education,age))
> chi.education.parity<-chisq.test(table(education,parity))
> chi.age.parity<-chisq.test(table(age,parity))
> etc.
>
> Your guidance would be much appreciated.
>
> --John J. Sparks, Ph.D.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list