[R] Filtering out bad data points

Bill.Venables at csiro.au Bill.Venables at csiro.au
Tue May 10 03:51:44 CEST 2011


You could use a function to do the job:

withinRange <- function(x, r = quantile(x, c(0.05, 0.95)))
    x >= r[1] & x <= r[2]

dtest2 <- subset(dftest, withinRange(x)) 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Robert A'gata
Sent: Tuesday, 10 May 2011 10:57 AM
To: r-help at r-project.org
Subject: [R] Filtering out bad data points

Hi,

I always have a question about how to do this best in R. I have a data
frame and a set of criteria to filter points out. My procedure is to
always locate indices of those points, check if index vector length is
greater than 0 or not and then remove them. Meaning

dftest <- data.frame(x=rnorm(100),y=rnorm(100));
qtile <- quantile(dftest$x,probs=c(0.05,0.95));
badIdx <- which((dftest$x < qtile[1]) | (dftest$x > qtile[2]));
if (length(badIdx) > 0) {
    dftest <- dftest[-idx,];
}

My question is that is there a more streamlined way to achieve this? Thank you.

Cheers,

Robert

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list