[R] Help with one of "those" apply functions

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Feb 2 23:34:34 CET 2011


Hi,

On Wed, Feb 2, 2011 at 4:08 PM, Robin Jeffries <rjeffries at ucla.edu> wrote:
> Hello there,
>
> I'm still struggling with the *apply commands. I have 5 people with id's
> from 10 to 14. I have varying amounts (nrep) of repeated outcome (value)
> measured on them.
>
> nrep <- 1:5
> id    <- rep(c("p1", "p2", "p3", "p4", "p5"), nrep)
> value <- rnorm(length(id))
>
> I want to create a new vector that contains the sum of the values per
> person.
>
> subject.value[1] <- value[1]    # 1 measurement
> subject.value[2] <- sum(value[2:3]) # the next 2 measurements
> ...
> subject.value[5] <- sum(value[11:15])  # the next 5 measurements
>
>
> I'd imagine it'll be some sort of *apply(value, nrep, sum) but I can't seem
> to land on the right format.
>
> Can someone give me a heads up as to what the correct syntax and function
> is?

In addition to tapply (as Phil pointed out), you can look at the
functions in plyr.

I somehow find them more intuitive, at times, then their sister "base"
functions, especially since more often than not you'll have your data
in a data.frame.

For instance:

R> set.seed(123)
R> nrep <- 1:5
R> id <- rep(c("p1", "p2", "p3", "p4", "p5"), nrep)
R> value <- rnorm(length(id))
R> DF <- data.frame(id=id, value=value)

R> tapply(value, id, sum)
        p1         p2         p3         p4         p5
-0.5604756  1.3285308  1.9148611 -1.9366599  1.5395087

R> library(plyr)
R> ddply(DF, .(id), summarize, total=sum(value))
  id      total
1 p1 -0.5604756
2 p2  1.3285308
3 p3  1.9148611
4 p4 -1.9366599
5 p5  1.5395087

In this case, though, I'll grant you that tapply is simpler if you
already know how to use it.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list