[R] Choose between duplicated rows

Tyler Rinker tyler_rinker at hotmail.com
Sat Apr 14 22:15:36 CEST 2012



My solution:
SP <- split(df, df[, 1:2])
minner <- function(x, col = 'numMiss') {    x[which.min(unlist(x[,col])), , drop=FALSE]}
NEW <- do.call('rbind', lapply(SP, minner))SP2 <- split(NEW, NEW[, 'id'])do.call('rbind', lapply(SP2, function(x) minner(x, 'A')))

Cheers,Tyler

> Date: Sat, 14 Apr 2012 12:03:36 -0700
> From: francy.casalino at gmail.com
> To: r-help at r-project.org
> Subject: [R] Choose between duplicated rows
> 
> Dear r experts,
> 
> Sorry for this basic question, but I can't seem to find a solution…
> 
> I have this data frame:
> df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A =
> c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 =
> c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N", "Y", "N", "Y",
> "N","N"), v5 = c(0,0,0,1,0,0), numMiss=c(3,0,3,0,2,3))
> 
> > df
>    id     A v1 v2 v3 v4 v5                numMiss
> 1 id1 11905 NA NA NA  N  0        3
> 2 id1 11907  3  2  1  Y  0                 0
> 3 id1 11907 NA NA NA  N  0        3
> 4 id2 11829  1  2  1  Y  1                 0
> 5 id2 11829  2 NA NA  N  0          2
> 6 id2 11829 NA NA NA  N  0       3
> 
> 
> And I need to keep, of the rows that have the same value for "A" by id, only
> the ones with the least amount of missing values for all the variables (with
> min(numMiss)) to get this:
> 
>    id     A v1 v2 v3 v4 v5                numMiss
> 1 id1 11905 NA NA NA  N  0        3
> 2 id1 11907  3  2  1  Y  0                 0
> 4 id2 11829  1  2  1  Y  1                 0
> 
> Then I have to choose the records with the least value of "A" of the rows
> that have the same id like this:
>    id     A v1 v2 v3 v4 v5                numMiss
> 1 id1 11905 NA NA NA  N  0        3
> 4 id2 11829  1  2  1  Y  1                 0
> 
> For groupings I have used the package "plyr" before, but this would involve
> a sort of double-grouping by id and by duplicated values of A…Could you
> please help me understand how this can be done? 
> 
> Thank you very much.
> -f
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Choose-between-duplicated-rows-tp4557833p4557833.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list