[R] party package: ctree - survival data - extracting statistics/predictors

Sarah Bonnin Sarah.Bonnin at crg.eu
Thu Aug 23 17:41:31 CEST 2012


Dear R users,

I am trying to apply the analysis processed in a paper, on the data I'm working with.

The data is: 80 patients for which I have survival data (time - days, and event - binary), and microarray expression data for 200 genes (predictor continuous variables).
My data matrix "data.test" has ncol: 202 and nrow: 80.

What I want to do is: 
- run recursive partitioning on this data to get groups of patients homogenous in terms of survival/prognosis.
- extract the "correlation" of single gene expression (each of the 200 genes) with recurrence-free survival (time and event): i want to know which variables can predict best a poor/good prognosis based on survival data.

I am using function "ctree" from the "party" package.

I came up with this command:
test <- ctree(Surv(time, event)~.,
	data =data.test, 
	controls=ctree_control(teststat="max", testtype="Bonferroni", mincriterion=0.95,savesplitstats = TRUE),
	ytrafo = function(data)trafo(data, numeric_trafo = rank), 
	xtrafo=function(data)trafo(data, surv_trafo=logrank_trafo(data, ties.method = "logrank"))
)
which works well but as I am not a statistician it is quite confusing and i might not run it properly.

My technical problem is that I would like to extract the statistics output from my "test" object (BinaryTree class), i.e. P-value of each of the 200 comparisons (survival data versus each gene): i would like to know which of them can be really correlated to each node of the tree.

I tried:
test at tree$criterion$statistic
but the maximum value of this is 16, so I assume it is not a p-value as such: what is it?
and:
test at tree$criterion$criterion
maximum value is 0.96 and minimum value is 0; only one is > 0.95

str(test) gives quite some information, but it is more confusing than helping me at the moment.

I want to know:
- if my command for "ctree" makes sense to people who have more experience than me with this kind of data...
- which elements of "test" represent which statistics and how to interpret them: as I understood, setting "mincriterion" to 0.95 equals to setting up a P-value threshold of 0.05 (ctree help: "when 'mincriterion = 0.95', the p-value must be smaller than $0.05$ in order to split this node.")

I hope my explanation is clear, I might be completely mistaken: any tip or guidance are more than welcome...

Thanks!
Sarah



sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] stats4    grid      splines   stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] biomaRt_2.10.0    party_1.0-2       vcd_1.2-13        colorspace_1.1-1  MASS_7.3-20      
 [6] strucchange_1.4-7 sandwich_2.2-9    zoo_1.7-7         coin_1.0-21       mvtnorm_0.9-9992 
[11] modeltools_0.2-19 survival_2.36-14 

loaded via a namespace (and not attached):
[1] lattice_0.20-6 RCurl_1.91-1.1 tools_2.14.2   XML_3.9-4.1  







------------------
Sarah Bonnin
Bioinformatician
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain



------------------
Sarah Bonnin
Bioinformatician
Genomics Unit - Office 439.01
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
Tel. +34 93-316-0373
www.crg.eu 



More information about the R-help mailing list