[R] summarizing a complex dataframe

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 11 23:04:17 CET 2012


Hi,

On Wed, Jan 11, 2012 at 3:55 PM, Christopher G Oakley
<coakley at bio.fsu.edu> wrote:
> I need some help summarizing complex data frames (small example below):
>
>    m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
> i1    1    1    1    2    2    2
> i1    2    1    1    2    2    2
> i2    2    2    1    2    2    2
>
>
> For an arbitrary number of columns (say m1 …. m199) where the column names have variable patterns,
>
> and such that each set of columns is repeated (with potentially unique data) an arbitrary number of times (say _1 … _1000),

[snip]

Perhaps your job would be easier if you change the layout of your data
frame, for instance you can have "experiment.name" and "replicate"
columns, so your "clean" data.frame would look like:

experiment.name   replicate   region   count
m1                       1              i1          1
m2                       1              i1          1
m3                       1              i1           1
...

You can use the reshape (or reshape2) package to help you whip your
old table into a new one using a formula interface, if you like.

You can then use your favorite split-apply-combine[1] method (via
plyr, data.table, sqldf, or even base::tapply) to calculate summary
statistics over the values of interest in each group/subgroup,
whatever.

HTH,
-steve

[1] The Split-Apply-Combine Strategy for Data Analysis:
http://www.jstatsoft.org/v40/i01

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list