[R] summarizing a complex dataframe

Thu Jan 12 02:30:12 CET 2012

will this do it for you:

> x <- read.table(text = "   m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
+    1    1    1    2    2    2
+     2    1    1    2    2    2
+     2    2    1    2    2    2", header = TRUE)
> # split out the main names of the column
> x.names <- do.call(rbind, strsplit(names(x), "_"))
> x.names
     [,1] [,2]
[1,] "m1" "1"
[2,] "m2" "1"
[3,] "m3" "1"
[4,] "m1" "2"
[5,] "m2" "2"
[6,] "m3" "2"
> # now create a list with the indices of the major columns
> x.indx <- split(seq(nrow(x.names)), x.names[,1])
> x.indx
$m1
[1] 1 4

$m2
[1] 2 5

$m3
[1] 3 6

> # now compute the means of each row and major group
> means <- lapply(x.indx, function(a) rowMeans(x[, a]))
> # cbind the results and put the major names on the new columns
> # put the results back in the data
> cbind(x, do.call(cbind, means))
  m1_1 m2_1 m3_1 m1_2 m2_2 m3_2  m1  m2  m3
1    1    1    1    2    2    2 1.5 1.5 1.5
2    2    1    1    2    2    2 2.0 1.5 1.5
3    2    2    1    2    2    2 2.0 2.0 1.5
>
>
>

On Wed, Jan 11, 2012 at 5:12 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jan 11, 2012, at 3:55 PM, Christopher G Oakley wrote:
>
>> I need some help summarizing complex data frames (small example below):
>>
>>   m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
>> i1    1    1    1    2    2    2
>> i1    2    1    1    2    2    2
>> i2    2    2    1    2    2    2
>>
>>
>> For an arbitrary number of columns (say m1 …. m199) where the column names
>> have variable patterns,
>>
>> and such that each set of columns is repeated (with potentially unique
>> data) an arbitrary number of times (say _1 … _1000),
>>
>> I would like to summarize by row the mean values of (m1, m2, m3, … m199)
>> over all replicates (_1, _2, _3, … _1000). I need to do this with a large
>> number of dataframes of variable nrow, ncolumn, and colnames.
>
>
> Something along the lines of this untested code:
>
> sapply(unique(sub("_.+$", "", names(dfrm))),
>          function(x)  rowMeans( dfrm[ , grep(x, names(dfrm)) ] )
>        )
>
> Post a reproducible example and we can test it.
>
>
>>
>> I've tried various loops creating new dataframes and reassigning cell
>> values in loops or using rbind and bind, but run into trouble in each case.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Chris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> David Winsemius, MD
> West Hartford, CT
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.