[R] selecting rows with more than x occurrences in a given column (data type is names)

Mike Jasper mikejjasper at gmail.com
Tue Mar 13 15:38:57 CET 2007


Despite a long search on the archives, I couldn't find how to do this.
Thanks in advance for what is likely a simple issue.

I have a data set where the first column is name (i.e., 'Joe Smith',
'Jane Doe', etc). The following columns are data associated with that
person. I have many people with multiple rows. What I want is to get a
new data frame out with only the people who have more than x
occurrences in the first column.

Here's what I've done, that's not working:

Let's call my old data.frame "all.data"

table(all.data$names)>10

I get a list of names and TRUE/FALSE values. I then want to make a
list of the TRUEs and pass that to some subset type command like

dup.names=table(all.data$names)>10

new.data=(all.data[all.data$names==dup.names,])

That's not working because the dimensions are wrong (I think). But
even when I tried to do part of it manually (to troubleshoot) like
this

dup.names=c('Joe Smith','Jane Doe','etc')

I got warnings and it didn't work correctly. There must be a simple
way to do this that I'm just not seeing. Thanks.



More information about the R-help mailing list