[R] summarizing a complex dataframe

jim holtman jholtman at gmail.com
Thu Jan 12 02:30:12 CET 2012


will this do it for you:

> x <- read.table(text = "   m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
+    1    1    1    2    2    2
+     2    1    1    2    2    2
+     2    2    1    2    2    2", header = TRUE)
> # split out the main names of the column
> x.names <- do.call(rbind, strsplit(names(x), "_"))
> x.names
     [,1] [,2]
[1,] "m1" "1"
[2,] "m2" "1"
[3,] "m3" "1"
[4,] "m1" "2"
[5,] "m2" "2"
[6,] "m3" "2"
> # now create a list with the indices of the major columns
> x.indx <- split(seq(nrow(x.names)), x.names[,1])
> x.indx
$m1
[1] 1 4

$m2
[1] 2 5

$m3
[1] 3 6

> # now compute the means of each row and major group
> means <- lapply(x.indx, function(a) rowMeans(x[, a]))
> # cbind the results and put the major names on the new columns
> # put the results back in the data
> cbind(x, do.call(cbind, means))
  m1_1 m2_1 m3_1 m1_2 m2_2 m3_2  m1  m2  m3
1    1    1    1    2    2    2 1.5 1.5 1.5
2    2    1    1    2    2    2 2.0 1.5 1.5
3    2    2    1    2    2    2 2.0 2.0 1.5
>
>
>

On Wed, Jan 11, 2012 at 5:12 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jan 11, 2012, at 3:55 PM, Christopher G Oakley wrote:
>
>> I need some help summarizing complex data frames (small example below):
>>
>>   m1_1 m2_1 m3_1 m1_2 m2_2 m3_2
>> i1    1    1    1    2    2    2
>> i1    2    1    1    2    2    2
>> i2    2    2    1    2    2    2
>>
>>
>> For an arbitrary number of columns (say m1 …. m199) where the column names
>> have variable patterns,
>>
>> and such that each set of columns is repeated (with potentially unique
>> data) an arbitrary number of times (say _1 … _1000),
>>
>> I would like to summarize by row the mean values of (m1, m2, m3, … m199)
>> over all replicates (_1, _2, _3, … _1000). I need to do this with a large
>> number of dataframes of variable nrow, ncolumn, and colnames.
>
>
> Something along the lines of this untested code:
>
> sapply(unique(sub("_.+$", "", names(dfrm))),
>          function(x)  rowMeans( dfrm[ , grep(x, names(dfrm)) ] )
>        )
>
> Post a reproducible example and we can test it.
>
>
>>
>> I've tried various loops creating new dataframes and reassigning cell
>> values in loops or using rbind and bind, but run into trouble in each case.
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Chris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> David Winsemius, MD
> West Hartford, CT
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list