[R] How to conditionally remove dataframe rows?

Marc Schwartz marc_schwartz at me.com
Thu Mar 7 14:43:07 CET 2013


Just to add another option to what Arun has provided below. That approach is very generalizable to data frames with >2 columns, where you want to filter based upon a finding a maximum value (or other perhaps more complex criteria) within one or more grouping columns and return all of the columns in the original data frame.

In this special case of a two column data frame, you can use ?aggregate easily with a formula based approach that might be easier to read. aggregate() essentially encapsulates what Arun has done below.

Thus:

> DF
  Point_counts Psi_Sp
1            A      0
2            A      1
3            B      1
4            B      2
5            B      0
6            C      1
7            D      1
8            D      2


> aggregate(Psi_Sp ~ Point_counts, data = DF, max)
  Point_counts Psi_Sp
1            A      1
2            B      2
3            C      1
4            D      2


Regards,

Marc Schwartz


On Mar 6, 2013, at 8:42 PM, arun <smartpink111 at yahoo.com> wrote:

> Hi,
> 
> dfrm<- read.table(text="
>         Point_counts      Psi_Sp
> 
> 1            A                      0
> 2            A                      1
> 3            B                      1
> 4            B                      2
> 5            B                      0
> 6            C                      1
> 7            D                      1
> 8            D                      2
> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>  res<-do.call(rbind,lapply(split(dfrm,dfrm$Point_counts),function(x) x[which.max(x$Psi_Sp),]))
>  row.names(res)<-1:nrow(res)
>  # Point_counts Psi_Sp
> #1            A      1
> #2            B      2
> #3            C      1 #your input data doesn't have 0
> #4            D      2
> A.K.
> 
> 
> 
> ----- Original Message -----
> From: Francisco Carvalho Diniz <chicocdiniz at gmail.com>
> To: r-help at r-project.org
> Cc: 
> Sent: Wednesday, March 6, 2013 6:21 PM
> Subject: [R] Fwd: How to conditionally remove dataframe rows?
> 
> Hi,
> 
> I have a data frame with two columns. I need to remove duplicated rows in
> first column, but I need to do it conditionally to values of the second
> column.
> 
> Example:
> 
>         Point_counts       Psi_Sp
> 
> 1            A                       0
> 2            A                       1
> 3            B                       1
> 4            B                       2
> 5            B                       0
> 6            C                       1
> 7            D                       1
> 8            D                       2
> 
> 
> I need to turn this data frame in one without duplicated rows at
> point-counts (one visit per point) but maintain the ones with maximum value
> at Psi_Sp, e.g. remove row 1 and maintain 2 or remove rows 3 and 5 and
> maintain 4. At the end I want a data frame like the one below:
> 
>          Point_counts           Psi_Sp
> 
> 1              A                           1
> 2              B                           2
> 3              C                           0
> 4              D                           2
> 
> How can I do it? I found several ways to edit data frames, but
> unfortunately I cound not use none of them.
> 
> I appreciate
> 
> Francisco



More information about the R-help mailing list