[R] Why unique(sample) decreases the performance ?

ufuk beyaztas ufukbeyaztas at gmail.com
Sun Mar 20 15:36:46 CET 2011


Hi,

I' am interested in differences between sample's result when samples consist
of full elements and consist of only distinct elements. When sample consist
of full elements it take about 120 sec., but when consist of only distinct
elements it take about 4.5 or 5 times more sec. I expected that opposite of
this result, because unique(sample) has less elements than full sample. Code
as follows;

e <- rnorm(n=50, mean=0, sd=sqrt(0.5625))
x0 <- c(rep(1,50))
x1 <- rnorm(n=50,mean=2,sd=1)
x2 <- rnorm(n=50,mean=2,sd=1)
x3 <- rnorm(n=50,mean=2,sd=1)
x4 <- rnorm(n=50,mean=2,sd=1)
y <- 1+ 2*x1+4*x2+3*x3+2*x4+e
x2[1] = 10     #influential observarion
y[1] = 10      #influential observarion

X <- matrix(c(x0,x1,x2,x3,x4),ncol=5)
Y <- matrix(y,ncol=1)
Design.data <- cbind(X, Y)
 
for (j in 1:nrow(X)) {

result <- vector("list", )

for( i in 1: 3100) {

data <- Design.data[sample(50,50,replace=TRUE),] ##### and
unique(Design.data.....)
dataX <- data[,1:5]
dataY <- data[,6]

B.cap.simulation <- solve(crossprod(dataX)) %*% crossprod(dataX, dataY)
P.simulation <- dataX %*% solve(crossprod(dataX)) %*% t(dataX)
Y.cap.simulation <- P.simulation %*% dataY
e.simulation <- dataY - Y.cap.simulation
dX.simulation <- nrow(dataX) - ncol(dataX)
var.cap.simulation <- crossprod(e.simulation) / (dX.simulation)
ei.simulation <- as.vector(dataY - dataX %*% B.cap.simulation)
pi.simulation <- diag(P.simulation)
var.cap.i.simulation <- (((dX.simulation) *
var.cap.simulation)/(dX.simulation - 1)) -
(ei.simulation^2/((dX.simulation - 1) * (1 - pi.simulation)))
ti.simulation <- ei.simulation / sqrt(var.cap.simulation * (1 -
pi.simulation))
ti.star.simulation <- ei.simulation / sqrt(var.cap.i.simulation * (1 -
pi.simulation))
pi.star.simulation <- pi.simulation + ei.simulation^2 /
crossprod(e.simulation)
WKi.simulation <- (ti.star.simulation)*sqrt(pi.simulation/(1-pi.simulation))
Wi.simulation <- WKi.simulation * sqrt((nrow(dataX)-1)/(1-pi.simulation))

result[[i]] <- list(outWi.simulation=(Wi.simulation),influ.obs = any (dataY
==Y[j,] ))

}

i.obs <- sapply(result,function(x) {x$influ.obs})
ni.result <- result[! i.obs]
ni.Wi.simulation <- sapply(ni.result,function(x) {x$outWi.simulation})
if (j==1) {
ni.Wi.simulation1 <-  ni.Wi.simulation
}else if (j==2) {
ni.Wi.simulation49 <-  matrix(ni.Wi.simulation , nrow=1)

}else{
ni.Wi.simulation49
<-cbind(ni.Wi.simulation49,matrix(ni.Wi.simulation,nrow=1))
}
}

Can someone give me an idea ? Many thanks.

--
View this message in context: http://r.789695.n4.nabble.com/Why-unique-sample-decreases-the-performance-tp3391199p3391199.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list