[R] exclude columns with at least three consecutive zeros

William Dunlap wdunlap at tibco.com
Wed Oct 12 19:23:39 CEST 2011


First define a function that returns TRUE if a column
should be dropped.  E.g.,

  has3Zeros.1 <- function(x) 
  {
      x <- x[!is.na(x)] == 0 # drop NA's, convert 0's to TRUE, others to FALSE
      if (length(x) < 3) {
          FALSE # you may want to further test short vectors
      } else {
          i <- seq_len(length(x) - 2)
          any(x[i] & x[i + 1] & x[i + 2])
      }
  }

or

  has3Zeros.2 <- function (x) 
  {
      x <- x[!is.na(x)] == 0
      r <- rle(x)
      any(r$lengths[r$values] >= 3)
  }

The use sapply on your data.frame with this function to see which
columns to omit and use [ to omit them:
  > e <- data.frame(Date=1980:1985,
  +                 A = c(2, 9, 18, 0, 12, 48),
  +                 B = c(75, NA, 15, 16, 43, 3),
  +                 C = c(12, 7, 0, 0, 0, 26),
  +                 D = c(41, 0, 0, NA, 0, 21))
  > e[, !sapply(e, has3Zeros.1), drop=FALSE]
    Date  A  B
  1 1980  2 75
  2 1981  9 NA
  3 1982 18 15
  4 1983  0 16
  5 1984 12 43
  6 1985 48  3

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Samir Benzerfa
> Sent: Wednesday, October 12, 2011 8:35 AM
> To: r-help at r-project.org
> Subject: [R] exclude columns with at least three consecutive zeros
> 
> Hi everyone,
> 
> 
> 
> I have a large data set with about 3'000 columns and I would like to exclude
> all columns which include three or more consecutive zeros (see below
> example). A further issue is that it should just jump NA values if any. How
> can I do this?
> 
> 
> 
> In the below example R should exclude column C and D (since in D jumping the
> NA leaves three consecutive zeros).
> 
> 
> 
> I would appreciate any solutions to this issue.
> 
> 
> 
> Many thanks!
> 
> S.B.
> 
> 
> 
> Date      A             B             C             D
> 
> 1980      2             75           12           41
> 
> 1981      9             NA         7             0
> 
> 1982      18           15           0             0
> 
> 1983      0             16           0             NA
> 
> 1984      12           43           0             0
> 
> 1985      48           3             26           21
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list