[R] Better way to create tables of mean & standard deviations

Tue Nov 7 15:53:22 CET 2006

Thank you, that was exactly what I was looking for.

On 11/7/06, hadley wickham <h.wickham at gmail.com> wrote:
> > > I can only think of  rather complex ways to solve the labeling issue...
> > >
> > > I would appreciate it if someone could point out if there are
> > > better/cleaner/easier ways of achieving what I'm trying todo.
> >
> >   Does this help?
> >
> > g <- function(y) {
> >   s <- apply(y, 2,
> >              function(z) {
> >                z <- z[!is.na(z)]
> >                n <- length(z)
> >                if(n==0) c(NA,NA,NA,0) else
> >                if(n==1) c(z, NA,NA,1) else {
> >                  m <- mean(z)
> >                  s <- sd(z)
> >                  c(Mean=m, SD=s, N=n)
> >                }
> >              })
> >   w <- as.vector(s)
> >   names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
> >   w
> > }
> >
> > df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y =
> > rnorm(480))
> >
> > library(Hmisc)
> >
> > with(df, summarize(cbind(Y),
> >                    llist(LAB, BATCH),
> >                    FUN = g,
> >                    stat.name=c("mean", "stdev", "n")))
> >
> >    LAB BATCH        mean     stdev  n
> > 1    1     1  0.13467569 1.0623188 30
> > 2    1     2  0.15204232 1.0464287 30
> > 3    2     1 -0.14470044 0.7881942 30
> > 4    2     2 -0.34641739 0.9997924 30
> > 5    3     1 -0.17915298 0.9720036 30
> > 6    3     2 -0.13942702 0.8166447 30
> > 7    4     1  0.08761900 0.9046908 30
> > 8    4     2  0.27103640 0.7692970 30
> > 9    5     1  0.08017377 1.1537611 30
> > 10   5     2  0.01475674 1.0598336 30
> > 11   6     1  0.29208572 0.8006171 30
> > 12   6     2  0.10239509 1.1632274 30
> > 13   7     1 -0.35550603 1.2016190 30
> > 14   7     2 -0.33692452 1.0458184 30
> > 15   8     1 -0.03779253 1.0385098 30
> > 16   8     2 -0.18652758 1.1768540 30
> >
> > with(df, summarize(cbind(Y),
> >                    llist(LAB),
> >                    FUN = g,
> >                    stat.name=c("mean", "stdev", "n")))
> >
> >   LAB        mean     stdev  n
> > 1   1  0.14335900 1.0454666 60
> > 2   2 -0.24555892 0.8983465 60
> > 3   3 -0.15929000 0.8902766 60
> > 4   4  0.17932770 0.8377011 60
> > 5   5  0.04746526 1.0988603 60
> > 6   6  0.19724041 0.9946316 60
> > 7   7 -0.34621527 1.1168682 60
> > 8   8 -0.11216005 1.1029466 60
> >
> >   Once you write the summary function g, it's not that complex.  See
> > ?summarize in the Hmisc package for more detail.  Also, you might take a
> > look at the doBy and reshape packages.
>
> With the reshape package, I'd do it like this:
>
> df <- data.frame(LAB = rep(1:8, each=60), BATCH = rep(c(1,2), 240), Y
> =rnorm(480))
> dfm <- melt(df, measured="Y")
>
> cast(dfm, LAB  ~ ., c(mean, sd, length))
> cast(dfm, LAB + BATCH ~ ., c(mean, sd, length))
> cast(dfm, LAB + BATCH ~ ., c(mean, sd, length), margins=T)
>
> Regards,
>
> Hadley
>