[R] Capping outliers

Aher ajit.aher at cedar-consulting.com
Tue Nov 22 07:06:34 CET 2011


Hi Experts,

I am new to R, using following sample code for capping outliers using
percentile information.  Working on large data (30000 observations and 150
variables), loop I am using in the below mentioned code for detecting
outliers and capping to upper /lower percentile value is taking much time
for the execution.
Is there anything wrong with code, can anyone suggest improvement in the
script to enhance performance!
min_pctle_cut <- 0.01
max_pctle_cut <- 0.99
library(outliers)

n <- 100
x1 <- runif(n) 
x2 <- runif(n) 
x3 <- x1 + x2 + runif(n)/10 
x4 <- x1 + x2 + x3 + runif(n)/10 
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) 
x6 <- factor(1*(x5=='a' | x5=='c')) 
data1 <- cbind(x1,x2,x3,x4,x5,x6) 
x <- data.frame(data1) 

z <- x[,sapply(x,is.numeric)]

qs <- sapply(z, function(z) quantile(z,
 	c(min_pctle_cut, max_pctle_cut), na.rm = TRUE)) 


#Loop below taking time for execution

system.time(for (i in 1:ncol(z))
{
	for (j in 1:nrow(z)) 
{
if (z[j,i] < qs[1,i]) z[j,i]=qs[1,i]
if (z[j,i] > qs[2,i]) z[j,i]=qs[2,i] 
} 
})



--
View this message in context: http://r.789695.n4.nabble.com/Capping-outliers-tp4094647p4094647.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list