[R] selecting rows with more than x occurrences in a given column (data type is names)

Stephen Tucker brown_emu at yahoo.com
Tue Mar 13 15:59:02 CET 2007


This isn't pretty, but should work:

x <- 10 # number of occurrences
y <- split(all.data,f=all.data$names)
z <- y[unlist(lapply(y,nrow))>x]
newdata <- vector()
for( k in z ) {
  newdata <- rbind(newdata,k)
}

Basically I split your data frame into groups by name (into a list), then
selected elements in the list for which the number of rows (number of
occurrences) was > x, then concatenated rows from the selected elements to an
initially empty vector. Probably there is a more elegant way to do this but I
can't think of it at the moment...

You are correct in that the conditional statement using '==' cannot test
vectors of mismatched dimensions.





--- Mike Jasper <mikejjasper at gmail.com> wrote:

> Despite a long search on the archives, I couldn't find how to do this.
> Thanks in advance for what is likely a simple issue.
> 
> I have a data set where the first column is name (i.e., 'Joe Smith',
> 'Jane Doe', etc). The following columns are data associated with that
> person. I have many people with multiple rows. What I want is to get a
> new data frame out with only the people who have more than x
> occurrences in the first column.
> 
> Here's what I've done, that's not working:
> 
> Let's call my old data.frame "all.data"
> 
> table(all.data$names)>10
> 
> I get a list of names and TRUE/FALSE values. I then want to make a
> list of the TRUEs and pass that to some subset type command like
> 
> dup.names=table(all.data$names)>10
> 
> new.data=(all.data[all.data$names==dup.names,])
> 
> That's not working because the dimensions are wrong (I think). But
> even when I tried to do part of it manually (to troubleshoot) like
> this
> 
> dup.names=c('Joe Smith','Jane Doe','etc')
> 
> I got warnings and it didn't work correctly. There must be a simple
> way to do this that I'm just not seeing. Thanks.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



 
____________________________________________________________________________________
Finding fabulous fares is fun.



More information about the R-help mailing list