[BioC] MiPP

Julian Lee julian at omniarray.com
Tue Jan 22 03:24:59 CET 2008


hits=-0.6 tests=BAYES_00,TVD_SPACED_SUBJECT_WORD
X-USF-Spam-Flag: NO
NO

Hi all,

I'm relatively new to Bioconductor and am still figuring out how to use MiPP for my work.

>From the help sheet in the MiPP documentation, 

##########
#Example 1: When an independent test set is available

data(leukemia)

#Normalize combined data
leukemia <- cbind(leuk1, leuk2)
leukemia <- mipp.preproc(leukemia, data.type="MAS4")

#Train set
x.train <- leukemia[,1:38]
y.train <- factor(c(rep("ALL",27),rep("AML",11)))

#Test set
x.test <- leukemia[,39:72]
y.test <- factor(c(rep("ALL",20),rep("AML",14)))

#Compute MiPP
out <- mipp(x=x.train, y=y.train, x.test=x.test, y.test=y.test, probe.ID = 1:nrow(x.train), n.fold=5, percent.cut=0.05, rule="lda")

#Print candidate models
out$model

Order Gene  Tr.ER Tr.MiPP Tr.sMiPP  Te.ER Te.MiPP Te.sMiPP Select
1      1  571 0.0526   30.86   0.8122 0.1176   23.92   0.7035       
2      2  436 0.0000   36.89   0.9707 0.0294   30.41   0.8945       
3      3  366 0.0000   37.95   0.9988 0.0294   31.35   0.9222       
4      4  457 0.0000   38.00   0.9999 0.0294   32.14   0.9453       
5      5  413 0.0000   38.00   1.0000 0.0294   32.18   0.9464       
6      6  635 0.0000   38.00   1.0000 0.0000   33.75   0.9927     **
7      7  648 0.0000   38.00   1.0000 0.0000   33.62   0.9889       
8      8  181 0.0000   38.00   1.0000 0.0294   31.99   0.9409       
9      9  309 0.0000   38.00   1.0000 0.0000   33.46   0.9842       
10    10   99 0.0000   38.00   1.0000 0.0882   28.56   0.8400   

Here are some questions,

i) how do I elucidate the misclassified samples in the model? ie which samples in the training or testing set were false positive/false negative? 

ii) The most parsimonious model is 6 with genes 571,436,366,457,413 and 635. Is it possible to elucidate the misclassified samples in previous orders, eg order 1 with Te.Error Rate of 0.1176?

iii) Could i use the built model to validate on other independent datasets? 

thank you

regards

Julian Lee
Bioinformatics Specialist
National Cancer Center Singapore

R version 2.5.1 (2007-06-27) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     

other attached packages:
    MiPP     MASS    e1071    class  Biobase 
 "1.8.0" "7.2-34" "1.5-16" "7.2-34" "1.14.1"



More information about the Bioconductor mailing list