[R] how to remove rows in which 2 or more observations are smaller than a given threshold?

William Dunlap wdunlap at tibco.com
Sun Feb 27 01:34:21 CET 2011


You didn't say if your data set was a matrix or data.frame.
Here are 2 functions that do the job on either and one that
only works with data.frames, but is faster (a similar speedup
is available for matrices as well).  They all compute the
number of small values in each row, nSmall, and extract the
rows for which nSmall is less than 2.

f0 <- function (x) { 
   nSmall <- apply(x, 1, function(row) sum(abs(row) <= 1.58)
   x[nSmall<2, , drop = FALSE]
}
f1 <- function (x) {
   nSmall<- rowSums(abs(x) < 1.58)
   x[nSmall<2, , drop = FALSE]
}
f2 <- function (x) {
    stopifnot(is.data.frame(x))
    nSmall <- 0
    for (column in x) {
        nSmall <- nSmall + (abs(column) < 1.58)
    }
    x[nSmall < 2, , drop = FALSE]
}

For a 10^5 row by 50 column data.frame I got the
following times:
  > system.time(r0 <- f0(z))
     user  system elapsed 
     2.39    0.04    2.51 
  > system.time(r1 <- f1(z))
     user  system elapsed 
     0.42    0.08    0.51 
  > system.time(r2 <- f2(z))
     user  system elapsed 
     0.21    0.05    0.24 
  > identical(r0, r1) && identical(r0, r2)
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of hind lazrak
> Sent: Saturday, February 26, 2011 3:37 PM
> To: r-help at r-project.org
> Subject: [R] how to remove rows in which 2 or more 
> observations are smaller than a given threshold?
> 
> Hello
> 
> The data set I am examining has 7425 observations (rows with unique
> identifiers) and 46 samples(columns).
> 
> I have been trying to generate a dataset that filters out observations
> that are "negligible"
> The definition of "negligible" is absolute value less or 
> equal  to 1.58.
> 
> The rule that I would like to adopt to create a new data is: drop rows
> in which 2 or more observations have absolute values <= 1.58.
> 
> Since I have unique identifier per row, I have tried to reshape the
> data so I could create a new variable using an ifelse statement that
> would flag observations <=1.58 but I am not getting anywhere with this
> approach
> 
> I could not come up with an apply function that counts the number of
> observations for which the absolute values are below the cutoff I've
> specified.
> 
> All observations are numerical and  I don't have missing values.
> 
> 
> Thank you in advance for the help,
> 
> Hind
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list