[R] drop rare factors

Sam Steingold sds at gnu.org
Thu Jan 19 20:44:40 CET 2012


> * Sarah Goslee <fnenu.tbfyrr at tznvy.pbz> [2012-01-18 17:36:16 -0500]:
>
> Here's one way, worked out in lots of steps so you can see
> how each works:

thanks, it all makes perfect sense, and I wrote this function based on
your instructions:

drop.levels <- function (frame, column, threshold) {
  size <- nrow(frame)
  if (threshold < 1) threshold <- threshold * size
  tab <- table(frame[column])
  keep <- names(tab)[tab >  threshold]
  drop <- names(tab)[tab <= threshold]
  cat("Keep(",column,")",length(keep)); print(tab[keep])
  cat("Drop(",column,")",length(drop)); print(tab[drop])
  frame1 <- frame[frame[column] %in% keep, ]
  size1 <- nrow(frame1)
  cat("Rows:",size,"-->",size1,"(dropped",100*(size-size1)/size,"%)\n")
  frame1[column] <- factor(frame1[column], levels=keep)
  frame1
}

alas, I get an error:

Rows: 87392 --> 0 (dropped 100 %)
Error in `[<-.data.frame`(`*tmp*`, column, value = NA_integer_) : 
  replacement has 1 rows, data has 0

when I do everything step-by-step interactively it works...

Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://ffii.org http://www.PetitionOnline.com/tap12009/ http://camera.org
http://palestinefacts.org http://jihadwatch.org http://pmw.org.il
Your mouse has moved - WinNT has to be restarted for this to take effect.



More information about the R-help mailing list