[R] Unique.data.frame...still getting duplicates

Liaw, Andy andy_liaw at merck.com
Fri Jun 25 04:31:51 CEST 2004


> From: F Z
> 
> Hi there
> 
> I have a data frame with about 65,000 rows and 8 variables.  
> I am trying to 
> get rid of the double entries of a factor variable "ID" so I 
> can get a 
> unique observation for each ID
> 
> I tried:
> 
> >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 
> >observations..so far so good, but then when I check for duplicates
> 
> >d_duplicated(dupl2$ID)
> >summary(as.factor(d))
> FALSE  TRUE
>   6836 14711
> 
> Meaning that I am still getting 14,711 duplicates!
> 
> I tried changing the ID type to integer and repeated the 
> process but I got 
> dentical results....what am I missing?

1.  Upgrade your version of R.  (That will teach you about using `_' for
assignment!)

2.  Call generics, not the methods; i.e., unique() instead of
unique.data.frame().

3.  You want a data frame where the IDs are unique, not the combination of
columns.  Use:

    dupl <- data[unique(ID),]

BTW, where did `dupl2' come from?

Andy
 
> Thanks!
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list