[R] Fwd: conditionally merging adjacent rows in a data frame

Gray Calhoun gray.calhoun at gmail.com
Tue Dec 8 15:57:24 CET 2009


I think I forgot to send the original to the mailing list, so I'm
forwarding it (see below).  Sorry about that (and sorry if I did
remember and this is a duplicate).  After a few more minutes of
thought, I realized that you should probably make sure that rt, tid,
and mood are also the same in consecutive rows when constructing the
'consecutiveROI' vector (just as an additional error check).

There may be a built-in function that could replace first three line
lines, as well.

Best,
Gray

---------- Forwarded message ----------
From: Gray Calhoun <gray.calhoun at gmail.com>
Date: Tue, Dec 8, 2009 at 6:42 AM
Subject: Re: [R] conditionally merging adjacent rows in a data frame
To: Titus von der Malsburg <malsburg at gmail.com>


Hi Titus,
 This solution isn't great and will probably need some work on your
part.  The basic idea is to create a new index that is shared by
consecutive rows with the same value of roi, then just aggregate by
the new index

> consecutiveROI <- d$roi[-1] == d$roi[1:(length(d$roi)-1)]
> newindex <- 1:dim(d)[1]
> newindex[c(consecutiveROI, FALSE)] <- newindex[c(FALSE, consecutiveROI)]

> aggregate(d$x, list(newindex = newindex), mean)

And the same for dur.  You can get the unique rows of d with

> d[a$newindex,]

There may be bugs, but I think this general approach will work well.

--Gray

On Tue, Dec 8, 2009 at 6:50 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:
> Hi, I have a data frame and want to merge adjacent rows if some condition is
> met.  There's an obvious solution using a loop but it is prohibitively slow
> because my data frame is large.  Is there an efficient canonical solution for
> that?
>
>> head(d)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 209   4  subj   4  9
> 58 5523 188   4  subj   4  7
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> The desired result would have consecutive rows with the same roi value merged.
> dur values should be added and x values averaged, other values don't differ in
> these rows and should stay the same.
>
>> head(result)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 397   4  subj   4  8
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> There's also a solution using reshape.  It uses an index for blocks
>
>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>
> melts and then casts for every column using an appropriate fun.aggregate.
> However, this is a bit cumbersome and also I'm not sure how to make sure that
> I get the original order of rows.
>
> Thanks for any suggestion.
>
>  Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Gray Calhoun

Assistant Professor of Economics
Iowa State University



-- 
Gray Calhoun

Assistant Professor of Economics
Iowa State University




More information about the R-help mailing list