[R] extract rows in dataframe with duplicated column values

Tiago R Magalhaes tiago17 at socrates.Berkeley.EDU
Fri Mar 18 19:21:44 CET 2005


Thank you very much to Andy Liaw, Rob J Goedman and Marc Schwartz for 
taking their time to solve my problem. I've learned in many other 
occasions from useful tips coming from all 3 of them and it just 
happened once again. You got to love this mailing list...

subset(x, a %in% a[duplicated(a)])

works in all cases and it's the simplest, but as always all the 
solutions made me understand a little better the R concepts and 
functions.

I would suggest to include this in the help pages for duplicated.
Also useful might be:

subset(x, !a %in% a[duplicated(a)])

giving all rows that don't have any duplicated

again thanks for all help in this mailing list


>Here's one more possibility:
>
>  > subset(x, a %in% a[duplicated(a)])
>   a  b
>2 2 10
>3 2 10
>4 3 10
>5 3 10
>6 3 10
>
>HTH,
>
>Marc Schwartz
>
>
>On Thu, 2005-03-17 at 22:25 -0500, Liaw, Andy wrote:
>>  OK, strike one...
>>
>>  Here's my second try:
>>
>>  > cnt <- table(x[,1])
>>  > v <- as.numeric(names(cnt[cnt > 1]))
>>  > v
>>  [1] 2 3
>>  > x[x[,1] %in% v, ]
>>    a  b
>>  2 2 10
>>  3 2 10
>>  4 3 10
>>  5 3 10
>>  6 3 10
>>
>>  Andy
>>
>>  > From: Liaw, Andy
>>  >
>>  > Does this work for you?
>>  >
>>  > > x[table(x[,1]) > 1,]
>>  >   a  b
>>  > 2 2 10
>>  > 3 2 10
>>  > 5 3 10
>>  > 6 3 10
>>  >
>>  > Andy
>>  >
>>  > > From: Tiago R Magalhaes
>>  > >
>>  > > Hi
>>  > >
>>  > > I want to extract all the rows in a data frame that have duplicates
>>  > > for a given column.
>>  > > I would expect this question to come up pretty often but I have
>>  > > researched the archives and surprisingly couldn't find anything.
>>  > > The best I can come up with is:
>>  > >
>>  > > x <- data.frame(a=c(1,2,2,3,3,3), b=10)
>>  > > xdup1 <- duplicated(x[,1])
>>  > > xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1]
>>  > > xAllDups <- x[(xdup1+xdup2)!=0,]
>>  > >
>>  > > This seems to work, but it's so convoluted that I'm sure there's a
>>  > > better method.
>>  > > Thanks for any help and enlightenment
>  > > >	[[alternative HTML version deleted]]




More information about the R-help mailing list