[R] Accessing data in groups created with split() and other beginner questions

Mon Mar 22 14:33:36 CET 2010

To access elements of a list (object returned by split), you need to use "[[".

Therefore,

summary(temp[[1]])

is what you meant to use (or even summ = lapply(temp, summary) - which
will give you the summaries for every subject).

About using PDFs, I'd recommend you to take a look at Sweave (
http://www.statistik.lmu.de/~leisch/Sweave/ )

b

On Mon, Mar 22, 2010 at 1:27 PM, Clay Heaton <ccheaton at gmail.com> wrote:
> Hi, very new to R here...
>
> I have a data frame called 'set' with 100k+ rows in it that looks like this:
>
>  subject           timestamp  yvalue traceabs subjtrace
> 1       1 1992-07-12 06:05:00      12        1       1-1
> 2       1 1992-07-12 06:10:00      15        1       1-1
> 3       1 1992-07-12 06:15:00      17        1       1-1
> 4       1 1992-07-12 06:20:00      20        1       1-1
> 5       1 1992-07-12 06:25:00      24        1       1-1
> ....
>
> There are 89 subjects, each of which have a different number of traces -- it's time series data. There are, in total, around 180 traces. The "subjtrace" variable is just a concatenation of the subject number, a hyphen, and the relative trace number. For instance, the first trace for subject 46 is "46-1" but the traceabs value for the same trace is 71.
>
> I need to perform simple statistics on each subject and on each trace. I also need to graph each trace.
>
> It seems like the easy approach to identifying the variables would be to use the split() function to create groups:
>
>> temp <- split(set, set$subject)
>
> When I then try, for example:
>
>> summary(temp[1])
>
> all I get as a result is:
>  Length Class      Mode
> 1 5      data.frame list
>
> So I went with:
>
>> lapply(temp[1], summary)
>
> That works, but I'm unable to do something like:
>
>> lapply(temp[1]$yvalue, mean)
>
> because the result returned is:
> list()
>
> Ultimately, I'm trying to run the exact same code on each group, as defined by the subject number, and each trace. I would like to display something like the following:
>
> Subject # and Summary Statistics
> -- Graph of a trace belonging to the subject
> -- Summary statistics for the trace
> -- Graph of the next trace belonging to the subject
> -- Summary statistics for the trace
> -- etc...
>
> My intention is to dump this all into a .pdf file with Sweave and LaTeX.
>
> Questions:
> - Is split() the best function to use to create the proper groups? or should I look to create a separate variable for each group using subset, like:
> temp.46 <- subset(set, subject==46,select=c(subject, timestamp, yvalue, subjtrace))
>
> - How do I call functions on data within the groups created by split()? Like...
> lapply(temp[1]$yvalue, sd)
>
> - In an effort to try to learn the proper way to approach this, what would be the best practice for iterating through the data and pushing it to .pdf?
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>