[R] Percentages in contingency tables *warning trivial question*

Chuck Cleland ccleland at optonline.net
Mon Dec 13 11:47:17 CET 2004


   You might want to look at CrossTable() in the gmodels package of the 
gregmisc bundle.  For example:

 > library(gmodels)
 > sex <- as.factor(sample(c("Male", "Female"), 100, replace=TRUE))
 > case <- as.factor(sample(c("Case", "Control"), 100, replace=TRUE))
 > CrossTable(sex, case)

    Cell Contents
|-----------------|
|               N |
|   N / Row Total |
|   N / Col Total |
| N / Table Total |
|-----------------|

Total Observations in Table:  100

              | case
          sex |      Case |   Control | Row Total |
-------------|-----------|-----------|-----------|
       Female |        21 |        29 |        50 |
              |     0.420 |     0.580 |     0.500 |
              |     0.420 |     0.580 |           |
              |     0.210 |     0.290 |           |
-------------|-----------|-----------|-----------|
         Male |        29 |        21 |        50 |
              |     0.580 |     0.420 |     0.500 |
              |     0.580 |     0.420 |           |
              |     0.290 |     0.210 |           |
-------------|-----------|-----------|-----------|
Column Total |        50 |        50 |       100 |
              |     0.500 |     0.500 |           |
-------------|-----------|-----------|-----------|

Rachel Pearce wrote:
> I hesitate to post this question in the light of recent threads, indeed
> I have hesitated for several weeks, however I have come to a full stop
> and really need some help if I am going to progress. I am a new user of
> R for medical statistics. I have attempted to read all the relevant
> documents, but would welcome any suggestions as to what I have missed.
> 
> I am trying to contruct "table 1" type contingency (mostly) tables. I
> would like to include percentages, thus:
> 
> 		Cases		Controls	Total
> 		N	%	N	%	N	%
> Total		50	100	50	100	100	100
> 
> 
> Sex: M	23 	46	27	54	50	50
> 
> etc...
> 
> I hesitate even more to mention it here, but I am thinking of something
> along the lines of PROC TABULATE in SAS.
> 
> The closest I have found in the documentation I have read so far is an
> example given in the help for "addmargins":
> 
> 	Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
> 	Sea <- sample( c("White","Black","Red","Dead"), 177,
> replace=TRUE )
> 	...
> 	# Weird function needed to return the N when computing
> percentages
> 	sqsm <- function( x ) sum( x )^2/100
> 	B <- table(Sea, Bee)
> 	round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
> 	apply( B, 2, sum )/100, "/" ), 1)
> 	round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
> 	apply(B, 1, sum )/100, "/"), 1)
> 
> .. Which introduced me to "sweep" and maybe could be extended to do
> what I want. But I don't like using mysterious "weird" functions.
> 
> I recently found Paul Johnson's Rtips where:
> http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function
> prop.table, which is also close to what I want. But how to show Ns and
> percentages im the same table? 
> 
> I wondered if there were a function which does this already. Or perhaps
> I should just write one for myself? Or should I not be trying to do this
> in R in the first place and go back to Excel (I no longer have access to
> SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the
> manuals? 
> 
> I have followed recent advice to look at Frank E Harrell's detailed
> tabulation code, but this seems to produce many errors on my system and
> with my version of R (see below). I do not have access to LaTeX
> (apologies for incorrect typography). I can provide details of the
> errors if it turns out that the answer to my question is RTFM by Prof
> Harrell.
> 
> I would like to add my two pennorth to the debate about "trivial"
> questions, of which I assume this is one. I believe that a very large
> amount of what is hard about learning R on one's own with documentation
> but without a real person, is a matter of vocabulary. I only found sweep
> and prop.table by chance since neither of them are indexed by words like
> "proportion" or "percentage" which is what I had been looking for.
> Similarly I still do not know exactly what "sweep" does, since I have
> never heard this verb used in a mathematical / statistical context, and
> the help on sweep states that what it does is sweep. I have experienced
> many similar examples in the last few weeks. This is not to say that
> there is anything wrong with the help on these functions nor with the
> help in general, but what R does not have is an extensive indexing
> system by synonyms and uses. It is largely for reasons like this, I
> believe, that trivial questions continue to be asked. If one does not
> know the name of the function to do "verb" and one has tried "verb" and
> the synonyms which spring to mind and drawn a blank, where to next? 
> 
> Another reason for difficulty is that while a function may exist to do
> something, it is sometimes hard to find the package where it is
> contained, e.g. Frank Harrell's functions seem to be in a package called
> Hmisc which is not listed in the drop-down box for "load package".
> 
> System and version information:
> 
> platform i386-pc-mingw32
> arch     i386           
> os       mingw32        
> system   i386, mingw32  
> status                  
> major    2              
> minor    0.1            
> year     2004           
> month    11             
> day      15             
> language R     
> 
> Rachel Pearce
> 
> British Society of Blood and Marrow Tranplantation
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894




More information about the R-help mailing list