[R] subsetting by groups, with conditions

Gabor Grothendieck ggrothendieck at gmail.com
Tue Dec 29 02:27:45 CET 2009


Assuming your data frame is called DF we can use sqldf like this.  The
inner select calculates the maximum AreaPoly2 for each group such that
Veg1 = Veg2 and the outer select returns the corresponding row.


library(sqldf)
sqldf("select * from DF a where AreaPoly2 =
      (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)")

Running it looks like this:

> library(sqldf)
> sqldf("select * from DF a where AreaPoly2 =
+       (select max(AreaPoly2) from DF where Veg1 = Veg2 and P1id = a.P1id)")
  P1id Veg1 Veg2 AreaPoly2 P2ID
1    1    p    p       1.5    2
2    2    p    p       2.0    3


On Mon, Dec 28, 2009 at 8:03 PM, Seth W Bigelow <sbigelow at fs.fed.us> wrote:
> I have a data set similar to this:
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1               1
> 1       p       p       1.5             2
> 2       p       p       2               3
> 2       p       h       3.5             4
>
> For each group of "Poly1id" records, I wish to output (subset) the record
> which has largest "AreaPoly2" value, but only if Veg1=Veg2. For this
> example, the desired dataset would be
>
> P1id    Veg1    Veg2    AreaPoly2       P2ID
> 1       p       p       1.5             2
> 2       p       p       2               3
>
> Can anyone point me in the right direction on this?
>
> Dr. Seth  W. Bigelow
> Biologist, USDA-FS Pacific Southwest Research Station
> 1731 Research Park Drive, Davis California
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list