[R] Type of multi-valued variable

Frank E Harrell Jr fharrell at virginia.edu
Sat Feb 15 20:05:03 CET 2003


On Sat, 15 Feb 2003 14:41:09 +0100
Fan <xiao.gang.fan1 at libertysurf.fr> wrote:

> Thanks to Frank for pointing out that. There're so many "misc" in the 
> package hmisc, I haven't yet explored all the functionalities !
> 
> The implementation of mChoice / summary() is very interesting, and it could
> be a good starting point for adding more functionalities on the class mChoice.
> 
> I'm having a little question on the usage of the function summary.formula() in hmisc:
> how to get the cross tabluations result like an array, as what xtabs does ?
> 
> For example, suppose "titanic" is a dataset as the following:
> > str(titanic)
> `data.frame':   1313 obs. of  11 variables:
>  $ pclass   : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
>  $ survived : int  1 0 0 0 1 1 1 0 1 0 ...
>  $ sex      : Factor w/ 2 levels "female","male": 1 1 2 1 2 2 1 2 1 2 ...
>  $ age      : num  29.000  2.000 30.000 25.000  0.917 ...
>  ...
> 
> > ftable(xtabs( ~ sex + pclass + survived, data=titanic))
>               survived   0   1
> sex    pclass                 
> female 1st               9 134
>        2nd              13  94
>        3rd             134  79
> male   1st             120  59
>        2nd             148  25
>        3rd             440  58
> 
> My question is how to get that with hmisc::summary() ?
> (survived could be a mChoice variable)
> 
> Thanks in advance
> --
> Fan
> 
> > 
> > On Mon, 10 Feb 2003 21:51:50 +0100
> > Fan <xiao.gang.fan1 at libertysurf.fr> wrote:
> > 
> > > Hi,
> > >
> > > I've read in the past a thead in the R discussion list
> > > about the multi-valued type variable (what was called checklist).
> > > At the moment Gregory had intention to add some general code
> > > in his gregmisc package.
> > >
> > > I'm wondering if there's some general code / packages available ?
> > >
> > > A general class for taking account this type of variable
> > > would be very useful in the domain of survey processings,
> > > as multi-responses questions are often used.
> > > The simple operations applied to these variables are holecount,
> > > cross tabulations with others variables, transformation to single
> > > coded variables like number of responses, etc.
> > >
> > > Thanks in advance for any help
> > > --
> > > Fan
> > >
> > 
> > Fan, Take a look at p. 38-44 of http://hesweb1.med.virginia.edu/biostat/s/doc/summary.pdf where examples of the mChoice (multiple choice) function in Hmisc are given.

Hello Fan,

[This reminds me that I forgot to mail you a paper I promised - will do that on Monday - Sorry]  For cross-classification, summarize in Hmisc is favored over summary(..., method='cross')  and summary(..., method='cross') does not handle mChoice variables until I make a small change to use the new function about to be described.  If you define

as.character.mChoice <- function(x) {
  lev <- dimnames(x)[[2]]
  d <- dim(x)
  w <- rep('',d[1])
  for(j in 1:d[2]) {
    w <- paste(w,ifelse(w!='' & x[,j],',',''),
               ifelse(x[,j],lev[j],''),sep='')
  }
w
}

you can add the line 
        if(inherits(xi,'mChoice')) xi <- as.character(xi) else
before
        if(is.matrix(xi) && ncol(xi) > 1) 
in summary.formula and obtain an (ugly) output with method='cross'.  Defining as.character.mChoice will fix summarize (here I'm using the titanic3 data frame):

n <- nrow(titanic3)
set.seed(1)
w <- c('good','bad','ugly')
a <- factor(sample(w,n,TRUE))
b <- factor(sample(w,n,TRUE))
m <- mChoice(a,b)
table(as.character(m))

      bad  bad,good  bad,ugly      good good,ugly      ugly 
      146       275       284       150       319       135 

attach(titanic3)
summarize(survived,llist(sex,pclass,m),
          function(y)c(died=sum(y==0),lived=sum(y==1)))

      sex pclass         m survived lived
1  female    1st       bad        0    14
2  female    1st  bad,good        1    28
3  female    1st  bad,ugly        0    34
4  female    1st      good        3    21
5  female    1st good,ugly        1    33
6  female    1st      ugly        0     9
7  female    2nd       bad        2    13
8  female    2nd  bad,good        1    28
9  female    2nd  bad,ugly        4    13
10 female    2nd      good        1     9
11 female    2nd good,ugly        4    19
. . . .

Here m is the multiple choice variable, not survived, but you get the idea.
These changes will be in the next version of Hmisc.
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat




More information about the R-help mailing list