[R] descriptive stats by cells in factorial design

Mike Miller mbmiller+l at gmail.com
Sun Aug 4 10:15:08 CEST 2013


Summary of my question:

"I have a 5-way factorial design, two levels per factor, so 32 cells, and 
I mostly just want the means and standard deviations for the contents of 
every cell.  Similarly, it would be nice to also have the range and maybe 
some percentiles, if there is a function that would just pump them out."

I received three answers:


On Sat, 3 Aug 2013, Søren Højsgaard wrote:

> The summaryBy function in the doBy package may help you.


On Sat, 3 Aug 2013, Jim Lemon wrote:

> You may find that the barNest function in plotrix is useful for showing 
> the means and standard deviations of nested designs.


On Sat, 3 Aug 2013, David Winsemius wrote:

> 'tapply' lets one apply a function to tabulated items. There are 
> 'describe' functions in a variety of packages.


I'll try to study the second two a bit more eventually, but the first 
answer solved my problem quite perfectly.  I wanted it to give the 25% and 
75% quantiles, so I made functions for those, then I did what you see 
below.  (Code and output at the end.)

Note that the neat fivenum() function would provide min, q25, median, q75 
and max, so I wouldn't need to create functions for q25 and q75, but 
having one function pump out a vector instead of a scalar seems to mess up 
the column naming scheme.  Using this function list...

FUN=c(mean, sd, min, q25, median, q75, max, length)

...gave me these column names:

Age.mean Age.sd Age.min Age.q25 Age.median Age.q75 Age.max Age.length

Which are what I want, but using this function list...

FUN=c(mean, sd, length, fivenum)

...gave me these much less descriptive numbered column names:

Age.FUN1 Age.FUN2 Age.FUN3 Age.FUN4 Age.FUN5 Age.FUN6 Age.FUN7 Age.FUN8

That is, it probably sees the length of the output vector for all of the 
functions and then creates labels.  If the length of that output vector 
equals the length of the function list, it uses appropriate labels. 
Otherwise it doesn't know the correspondence of functions with vector 
elements, so it uses a numbering scheme.


My code:


> x <- read.delim("ID_data.txt", colClasses=c("character","factor","numeric",rep("factor",4)))
> str(x)
'data.frame':   4434 obs. of  7 variables:
   $ ID        : chr  "200" "201" "211" "2000" ...
   $ Cohort    : Factor w/ 2 levels "11","17": 2 2 2 2 2 2 2 2 2 2 ...
   $ Age       : num  18.1 18.1 49.2 18 18 ...
   $ Sex       : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
   $ Zygosity  : Factor w/ 2 levels "DZ","MZ": 2 2 2 1 1 1 1 2 2 2 ...
   $ Generation: Factor w/ 2 levels "Offspring","Parent": 1 1 2 1 1 1 1 1 1 2 ...
   $ ESstatus  : Factor w/ 2 levels "ES","notES": 2 2 2 2 2 2 2 2 2 2 ...
> install.packages("doBy")
> library(doBy)
> q25 <- function(x){quantile(x,.25,names=F)}
> q75 <- function(x){quantile(x,.75,names=F)}
> summaryBy(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, FUN=c(mean, sd, min, q25, median, q75, max, length))
     Generation Zygosity    Sex Cohort ESstatus Age.mean    Age.sd Age.min Age.q25 Age.median Age.q75 Age.max Age.length
1   Offspring       DZ Female     11       ES 17.78528 0.3535863   16.93 17.6000     17.775 17.9650   18.92        106
2   Offspring       DZ Female     11    notES 18.13679 0.5555968   16.76 17.8525     18.190 18.4575   19.50        162
3   Offspring       DZ Female     17    notES 17.47529 0.4569588   16.56 17.0700     17.590 17.8700   18.29        191
4   Offspring       DZ   Male     11       ES 17.76149 0.3467540   17.18 17.5150     17.715 18.0000   18.71        134
5   Offspring       DZ   Male     11    notES 17.87667 0.5187333   16.83 17.4600     17.860 18.2400   19.02        153
6   Offspring       DZ   Male     17    notES 17.50418 0.3915823   16.73 17.1900     17.530 17.8300   18.52        165
7   Offspring       MZ Female     11       ES 17.87628 0.4506530   16.86 17.6775     17.805 18.1000   19.12        196
8   Offspring       MZ Female     11    notES 18.05739 0.6103713   16.76 17.6300     18.050 18.4200   19.70        291
9   Offspring       MZ Female     17    notES 17.41061 0.4956190   16.55 16.9700     17.340 17.8200   18.45        395
10  Offspring       MZ   Male     11       ES 17.77174 0.3236917   16.84 17.5800     17.790 17.9700   19.02        195
11  Offspring       MZ   Male     11    notES 17.87718 0.6472397   16.56 17.3300     17.855 18.2100   20.01        284
12  Offspring       MZ   Male     17    notES 17.49114 0.3961757   16.65 17.1775     17.500 17.8100   18.35        332
13     Parent       DZ Female     11       ES 44.61512 5.1246314   32.17 41.3400     44.680 48.2800   57.95        121
14     Parent       DZ Female     11    notES 42.54346 4.3670998   34.03 39.3450     42.110 45.5500   57.06        107
15     Parent       DZ Female     17    notES 46.30559 4.9177705   36.10 42.7275     45.765 48.3350   62.69         68
16     Parent       DZ   Male     11       ES 44.60206 4.5605484   34.31 41.4475     44.890 47.4975   58.75        126
17     Parent       DZ   Male     11    notES 42.71121 4.9600561   32.05 39.2400     42.760 45.2700   58.20        157
18     Parent       DZ   Male     17    notES 46.77458 4.0226198   40.18 44.1250     46.000 48.8200   61.12         59
19     Parent       MZ Female     11       ES 44.23476 5.0214627   29.55 40.6925     44.125 47.7300   56.73        206
20     Parent       MZ Female     11    notES 42.31988 5.3622671   30.31 38.6050     41.835 46.0175   56.58        172
21     Parent       MZ Female     17    notES 46.36490 5.1770435   34.88 42.4200     45.950 49.4950   63.18        155
22     Parent       MZ   Male     11       ES 43.40787 5.3507439   31.28 39.9700     43.440 46.4800   64.65        197
23     Parent       MZ   Male     11    notES 41.56363 4.6564818   32.10 38.0250     41.390 44.6450   65.29        331
24     Parent       MZ   Male     17    notES 46.69298 5.2421896   34.45 43.1500     45.890 49.0050   63.80        131


Thanks very much.

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota


More information about the R-help mailing list