[BioC] undefined columns selected error when using bagging{ipred}

Constanze [guest] guest at bioconductor.org
Wed Sep 5 17:21:28 CEST 2012


Dear All,

i'm trying to reproduce the results of the survival analysis in Capter 17, p.307 of "Bioinformatics and Computational Biology Solutions using R and Bioconductor" using the code chunks from http://www.bioconductor.org/help/publications/books/bioinformatics-and-computational-biology-solutions/chapter-code/Computational_Inference.R
The call to the bagging function throws an error, although i decreased the amount of variables selected to p=25 (so the model fit wouldn't be over-determined). The code is below.

Thanks a lot,

Constanze


> library("exactRankTests")
 Package ‘exactRankTests’ is no longer under development.
 Please consider using package ‘coin’ instead.

> # library("coin")
> library("ipred")
Lade nötiges Paket: rpart
Lade nötiges Paket: MASS
Lade nötiges Paket: mlbench
Lade nötiges Paket: nnet
Lade nötiges Paket: class
> library("kidpack")

*** Deprecation warning ***:
The package 'kidpack' is deprecated and will not be supported after Bioconductor release 2.1.


> data(eset)
> var_selection <- function(indx, expressions, response, p = 100) {
+ 
+     y <- switch(class(response),
+         "factor" = { model.matrix(~ response - 1)[indx, ,drop = FALSE] },
+         "Surv" = { matrix(cscores(response[indx]), ncol = 1) },
+         "numeric" = { matrix(rank(response[indx]), ncol = 1) }
+     )
+ 
+     x <- expressions[,indx, drop = FALSE]
+     n <- nrow(y)
+     linstat <- x %*% y
+     Ey <- matrix(colMeans(y), nrow = 1)
+     Vy <- matrix(rowMeans((t(y) - as.vector(Ey))^2), nrow = 1)
+ 
+     rSx <- matrix(rowSums(x), ncol = 1)   
+     rSx2 <- matrix(rowSums(x^2), ncol = 1)
+     E <- rSx %*% Ey
+     V <- n / (n - 1) * kronecker(Vy, rSx2)
+     V <- V - 1 / (n - 1) * kronecker(Vy, rSx^2)
+ 
+     stats <- abs(linstat - E) / sqrt(V)
+     stats <- do.call("pmax", as.data.frame(stats))
+     return(which(stats > sort(stats)[length(stats) - p]))
+ }
> 
> 
> remove <- is.na(eset$survival.time)
> seset <- eset[,!remove]
> response <- Surv(seset$survival.time, seset$died)
> response[response[,1] == 0] <- 1
> expressions <- t(apply(exprs(seset), 1, rank))
> exprDF <- as.data.frame(t(expressions))
> 
> I <- nrow(exprDF)
> Iindx <- 1:I
> selected <- var_selection(Iindx, expressions, response,p=25)
> bagg <- bagging(response ~., data = exprDF[,selected],ntrees = 100)
Fehler in `[.data.frame`(m, attr(Terms, "term.labels")) : 
  undefined columns selected


 -- output of sessionInfo(): 

R version 2.15.1 (2012-06-22)
Platform: i486-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=de_DE.utf8       LC_NUMERIC=C             
 [3] LC_TIME=de_DE.utf8        LC_COLLATE=de_DE.utf8    
 [5] LC_MONETARY=de_DE.utf8    LC_MESSAGES=de_DE.utf8   
 [7] LC_PAPER=C                LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] kidpack_1.5.10        ipred_0.8-8           class_7.3-4          
 [4] nnet_7.3-4            mlbench_2.1-1         MASS_7.3-21          
 [7] rpart_3.1-54          exactRankTests_0.8-22 affy_1.26.0          
[10] Biobase_2.8.0         survival_2.36-14     

loaded via a namespace (and not attached):
[1] affyio_1.16.0         preprocessCore_1.10.0 tools_2.15.1         


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list