[R] summarizing a complex dataframe

Bert Gunter gunter.berton at gene.com
Wed Jan 11 23:09:12 CET 2012


Well, if I understand what you want to do, it's straightforward, ?"["
(pay attention to the use of column names) and ?grep would pick out
the columns you want and you could then use mapply or maybe rowMeans
or whatever to get your summaries.

HOWEVER ... I think what you should really should do is use a more
appropriate data structure. What seems more natural to me is to
convert from "wide" to "long" format so that you would end up with 3
columns: Result, ID, Rep. The Result would be the value, the ID your
m1, m2, etc. and Rep your _1,_2, _3, etc. Again, this appears to be
easy: ?unlist would get you the vector of Results and either ?strsplit
or grep would get you all the ID's and reps, each of which just has to
be repped the number of rows of your frame. Alternatively, ?reshape in
base R or the reshape package can probably do it for you. Once you
have a more R friendly data structure, it will be much easier for you
to work with your data.

Finally, you may wish to post your query on a more relevant list (e.g.
geo or ecology or whatever your data are) as folks there may have
better ideas for what a more "R friendly data structure" should be.

Cheers,
Bert



On Wed, Jan 11, 2012 at 12:55 PM, Christopher G Oakley
<coakley at bio.fsu.edu> wrote:
> I need some help summarizing complex data frames (small example below):
>
>    m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
> i1    1    1    1    2    2    2
> i1    2    1    1    2    2    2
> i2    2    2    1    2    2    2
>
>
> For an arbitrary number of columns (say m1 …. m199) where the column names have variable patterns,
>
> and such that each set of columns is repeated (with potentially unique data) an arbitrary number of times (say _1 … _1000),
>
> I would like to summarize by row the mean values of (m1, m2, m3, … m199) over all replicates (_1, _2, _3, … _1000). I need to do this with a large number of dataframes of variable nrow, ncolumn, and colnames.
>
> I've tried various loops creating new dataframes and reassigning cell values in loops or using rbind and bind, but run into trouble in each case.
>
> Any ideas?
>
> Thanks,
>
> Chris
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list