[R] Tabulating using arbitrary numbers of factors

jim holtman jholtman at gmail.com
Sat Oct 3 02:59:57 CEST 2009


try 'reshape':

> require(reshape)
> # add a column to accumulate on
> tmp$inc <- 1
> recast(tmp, f1 + f2 + f3 ~ ., sum)
Using f1, f2, f3 as id variables
       f1       f2    f3 (all)
1    Male    White  0-20     3
2    Male    White 21-40     4
3    Male    White 41-60     2
4    Male    White 61-80     3
5    Male    Black  0-20     3
6    Male    Black 21-40     4
7    Male    Black 41-60     2
8    Male    Black 61-80     3
9    Male Hispanic  0-20     4
10   Male Hispanic 21-40     4
11   Male Hispanic 41-60     4
12   Male Hispanic 61-80     3
13   Male    Other  0-20     3
14   Male    Other 21-40     2
15   Male    Other 41-60     2
16   Male    Other 61-80     4
17 Female    White  0-20     2
18 Female    White 21-40     4
19 Female    White 41-60     4
20 Female    White 61-80     3
21 Female    Black  0-20     5
22 Female    Black 21-40     3
23 Female    Black 41-60     4
24 Female    Black 61-80     1
25 Female Hispanic  0-20     1
26 Female Hispanic 21-40     2
27 Female Hispanic 41-60     4
28 Female Hispanic 61-80     3
29 Female    Other  0-20     4
30 Female    Other 21-40     2
31 Female    Other 41-60     3
32 Female    Other 61-80     5
>
>


On Fri, Oct 2, 2009 at 2:15 PM, Andrew Spence <aspence at rvc.ac.uk> wrote:
> Dear R-help,
>
>
>
> First of all, thank you VERY much for any help you have time to offer. I
> greatly appreciate it.
>
>
>
> I would like to write a function that, given an arbitrary number of factors
> from a data frame, tabulates the number of occurrences of each unique
> combination of the factors. Cleary, this works:
>
>
>
>> table(horse,date,surface)
>
> <SNIP>
>
> , , surface = TURF
>
>
>
>                   date
>
> horse               20080404 20080514 20081015 20081025 20081120 20081203
> 20090319
>
>  Bedevil                  0        0        0        0        0        0
> 0
>
>  Cut To The Point       227        0        0        0        0        0
> 0
>
> <SNIP>
>
>
>
> But I would prefer output that skips all the zeros, flattens any dimensions
> greater than 2, and gives the level names rather than codes. I can write
> code specifically for n factors like this: (here 2 levels):
>
>
>
> ft <- function(x,y) {cbind(
> levels(x)[unique(cbind(x,y))[,1]],levels(y)[unique(cbind(x,y))[,2]],
> table(x,y)[unique(cbind(x,y))])}
>
>
>
> which gives the lovely output I'm looking for:
>
>
>
> #      [,1]                [,2]       [,3]
>
> # [1,] "Cut To The Point"  "20080404" "227"
>
> # [2,] "Prairie Wolf"      "20080404" "364"
>
> # [3,] "Bedevil"           "20080514" "319"
>
> # [4,] "Prairie Wolf"      "20080514" "330"
>
>
>
> But my attempts to make this into a function that handles arbitrary numbers
> of factors as separate input arguments has failed. The closest I can get is:
>
>
>
> ft2 <- function (...) { cbind( unique(cbind(...)),
> table(...)[unique(cbind(...))] )
>
>
>
> giving:
>
>> ft2(horse,date)
>
>      horse date
>
>  [1,]     2    1 227
>
>  [2,]     9    1 364
>
>  [3,]     1    2 319
>
>  [4,]     9    2 330
>
>  [5,]     9    3 291
>
>  [6,]    12    3 249
>
>  [7,]    10    3 286
>
>  [8,]     5    4 217
>
>  [9,]     3    4 426
>
> [10,]     8    4 468
>
> [11,]     9    5 319
>
> [12,]    13    5 328
>
> [13,]    12    5 138
>
> [14,]     7    6 375
>
> [15,]    11    6 366
>
> [16,]     4    7 255
>
> [17,]     6    7 517
>
>
>
> I would be greatly in debt to anyone willing to show me how to make the
> above function take arbitrary inputs and still produce output displaying
> factor level names instead of the underlying coded numbers.
>
>
>
> Cheers and thanks for your time!
>
>
>
> Andrew Spence
> RCUK Academic Research Fellow
> Structure and Motion Laboratory
> Royal Veterinary College
> Hawkshead Lane
> North Mymms, Hatfield
> Hertfordshire AL9 7TA
> +44 (0) 1707 666988
>
> mailto:aspence at rvc.ac.uk
>
> http://www.rvc.ac.uk/sml/People/andrewspence.cfm
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list