[R] conditionally merging adjacent rows in a data frame

Nikhil Kaza nikhil.list at gmail.com
Tue Dec 8 16:29:08 CET 2009


How about creating an index using multiple columns.

  a <- with(d, aggregate(dur, list(rt=rt,tid=tid,mood=mood,roi=roi),  
sum))
  b <- with(d, aggregate(x, list(rt=rt,tid=tid,mood=mood,roi=roi),  
mean))
c <- merge(a, b, by=c("rt","tid","mood", "roi"))

I suppose one could save some time by not running aggregate twice on  
the same dataset, but I am not sure how.

Nikhil

On 8 Dec 2009, at 7:50AM, Titus von der Malsburg wrote:

> Hi, I have a data frame and want to merge adjacent rows if some  
> condition is
> met.  There's an obvious solution using a loop but it is  
> prohibitively slow
> because my data frame is large.  Is there an efficient canonical  
> solution for
> that?
>
>> head(d)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 209   4  subj   4  9
> 58 5523 188   4  subj   4  7
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> The desired result would have consecutive rows with the same roi  
> value merged.
> dur values should be added and x values averaged, other values don't  
> differ in
> these rows and should stay the same.
>
>> head(result)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 397   4  subj   4  8
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> There's also a solution using reshape.  It uses an index for blocks
>
>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>
> melts and then casts for every column using an appropriate  
> fun.aggregate.
> However, this is a bit cumbersome and also I'm not sure how to make  
> sure that
> I get the original order of rows.
>
> Thanks for any suggestion.
>
>  Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list