[R] Better way to create tables of mean & standard deviations

hadley wickham h.wickham at gmail.com
Tue Nov 7 14:15:48 CET 2006


> > I can only think of  rather complex ways to solve the labeling issue...
> >
> > I would appreciate it if someone could point out if there are
> > better/cleaner/easier ways of achieving what I'm trying todo.
>
>   Does this help?
>
> g <- function(y) {
>   s <- apply(y, 2,
>              function(z) {
>                z <- z[!is.na(z)]
>                n <- length(z)
>                if(n==0) c(NA,NA,NA,0) else
>                if(n==1) c(z, NA,NA,1) else {
>                  m <- mean(z)
>                  s <- sd(z)
>                  c(Mean=m, SD=s, N=n)
>                }
>              })
>   w <- as.vector(s)
>   names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
>   w
> }
>
> df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y =
> rnorm(480))
>
> library(Hmisc)
>
> with(df, summarize(cbind(Y),
>                    llist(LAB, BATCH),
>                    FUN = g,
>                    stat.name=c("mean", "stdev", "n")))
>
>    LAB BATCH        mean     stdev  n
> 1    1     1  0.13467569 1.0623188 30
> 2    1     2  0.15204232 1.0464287 30
> 3    2     1 -0.14470044 0.7881942 30
> 4    2     2 -0.34641739 0.9997924 30
> 5    3     1 -0.17915298 0.9720036 30
> 6    3     2 -0.13942702 0.8166447 30
> 7    4     1  0.08761900 0.9046908 30
> 8    4     2  0.27103640 0.7692970 30
> 9    5     1  0.08017377 1.1537611 30
> 10   5     2  0.01475674 1.0598336 30
> 11   6     1  0.29208572 0.8006171 30
> 12   6     2  0.10239509 1.1632274 30
> 13   7     1 -0.35550603 1.2016190 30
> 14   7     2 -0.33692452 1.0458184 30
> 15   8     1 -0.03779253 1.0385098 30
> 16   8     2 -0.18652758 1.1768540 30
>
> with(df, summarize(cbind(Y),
>                    llist(LAB),
>                    FUN = g,
>                    stat.name=c("mean", "stdev", "n")))
>
>   LAB        mean     stdev  n
> 1   1  0.14335900 1.0454666 60
> 2   2 -0.24555892 0.8983465 60
> 3   3 -0.15929000 0.8902766 60
> 4   4  0.17932770 0.8377011 60
> 5   5  0.04746526 1.0988603 60
> 6   6  0.19724041 0.9946316 60
> 7   7 -0.34621527 1.1168682 60
> 8   8 -0.11216005 1.1029466 60
>
>   Once you write the summary function g, it's not that complex.  See
> ?summarize in the Hmisc package for more detail.  Also, you might take a
> look at the doBy and reshape packages.

With the reshape package, I'd do it like this:

df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y
=rnorm(480))
dfm <- melt(df, measured="Y")

cast(dfm, LAB  ~ ., c(mean, sd, length))
cast(dfm, LAB + BATCH ~ ., c(mean, sd, length))
cast(dfm, LAB + BATCH ~ ., c(mean, sd, length), margins=T)

Regards,

Hadley



More information about the R-help mailing list