[R] How to delete a duplicate observation

jim holtman jholtman at gmail.com
Thu Sep 13 20:41:01 CEST 2007


Here a way of doing it:

> x <- cbind(V1=sample(1:3,20,TRUE), V2=sample(1:3,20,TRUE), V3=sample(20))
> x
      V1 V2 V3
 [1,]  2  2  1
 [2,]  1  2  6
 [3,]  3  2 10
 [4,]  3  1 11
 [5,]  3  2  5
 [6,]  3  2  7
 [7,]  2  1 19
 [8,]  3  3 13
 [9,]  1  3  2
[10,]  3  3 20
[11,]  3  3 18
[12,]  2  1  4
[13,]  3  2  3
[14,]  3  2 12
[15,]  3  1 17
[16,]  2  3  9
[17,]  2  3  8
[18,]  1  1 16
[19,]  3  2 15
[20,]  3  3 14
> x.max <- do.call('rbind', by(x, list(x[,1], x[,2]), function(.sub){
+     .sub[which.max(.sub[,3]),]
+ }))
> x.max
   V1 V2 V3
18  1  1 16
7   2  1 19
15  3  1 17
2   1  2  6
5   2  2  1
19  3  2 15
9   1  3  2
16  2  3  9
10  3  3 20
>


On 9/13/07, Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
> nuyaying wrote:
> > I have a data set with 3 variables V1, V2, V3.  If there are 2 data points
> > have the same values on both V1 and V2,  I want to delete one of them which
> > has smaller V3 value.    i.e., in the data below, I want to delete
> > the first observation.  How can I do that ?    Thanks in advance!
> >
> > V1  V2  V3
> > 3    3     1
> > 3    3     4
> >
> >
> Tricky one... I think something like this should work:
>
> l <- split(d$V3, list(d$V1,d$V2))
> ixl <- lapply(l, function(x) {
>   if ((n <- nrow(x)) == 2)
>      seq_len(n) != which.min(x)
>   else
>      rep(TRUE, n)
> })
> ix <- unsplit(ixl, list(d$V1,d$V2))
> d[ix,]
>
> --
>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list