[R] party ctree : getting a one-node tree
Yuliya Matveyeva
yuliya.rmail at gmail.com
Thu Jun 27 10:49:51 CEST 2013
Dear useRs,
I am currently using the ctree function (package "party") and am stuck at
the problem of getting a one-node tree (a tree with no splits consisting of
the root only) even if I maximally loosen all the stopping criteria
(mincriterion = 0, minbucket = 0, minsplit = 0).
In order to check that in my data there really is no variable that would
not be considered independent of the response variable, I have found the
package "coin" written by the same authors as the package "party", where
the Strasser-Weber tests (used inside the ctree function) are accessible as
separate functions (while they are incapsulated in the package "party").
But to my surprise running the tests from that package manually brings me
to finding many variables for which the hypothesis of independence is
rejected with a very-very small p-value (less than 2e-16). So this brings
me to thinking that even with the Bonferroni corrections I should still be
getting some splits in my ctree (having ~90 variables makes the Bonferroni
sig.level = ~0.01, given that the overall sig.level is 1 (maximal)).
So I am really confused... I would be really grateful if anyone could
please help me untangle this confusion...
I was not able to generate a small data-matrix to get the results as
described above. And my original data is rather big (9000 rows + 90
variables).
Here is my code :
---------------------------
f_data <- "feature_matrix.txt-dummies-df1";
library(party)
# ///////////////////////////////////
df <- read.table(file = f_data, sep = "*", header = F,
colClasses = "character");
ids <- df[,1]; weights1 <- as.integer(df[,2]); df <- df[,-c(1,2)]
for (i in 1:ncol(df)) { df[,i] <- as.numeric(df[,i]); }
colnames(df)[ncol(df)] <- "dep_var";
# ------- delete variables with zero-variance ---------
vars_to_delete <- c();
for (i in 1:ncol(df)) {
if (!(var(df[,i]) > 0)) {
vars_to_delete <- c(vars_to_delete, i);
}
}
df <- df[,-vars_to_delete];
# --------------------------------------------------------------------
signif_level1 <- 1
mincriterion1 <- 1 - signif_level1;
minbucket1 <- 0
teststat1 <- "quad"
testtype1 <- "Bonferroni"
system.time({ ctree.2 <- ctree(dep_var ~ ., data = df, weights = weights1,
controls = ctree_control(minbucket = minbucket1, minsplit = 2*minbucket1,
maxdepth = 0,
mincriterion = mincriterion1,
teststat = teststat1, testtype = testtype1,
savesplitstats = TRUE)) })
ctree.2
// here I get a tree with one node that is the root
# --------------------------------------------------------------------
library(coin)
p <- new("IndependenceProblem",
x = df[,colnames(df) != "dep_var"], y = df[,"dep_var", drop = F],
weights = weights1)
s <- independence_test(p,
teststat = teststat1, distribution = "asymptotic", alternative =
"two.sided" )
s
vars <- colnames(df); vars <- vars[vars != "dep_var"]
test_pvalues <- list()
for (v in vars) {
p <- new("IndependenceProblem",
x = df[,v, drop=F], y = df[,"dep_var", drop = F],
weights = weights1)
s <- independence_test(p,
teststat = teststat1, distribution = "asymptotic", alternative =
"two.sided" )
test_pvalues[[v]] <- pvalue(s)
}
test_pvalues
// here I get several p-values that are zero
--
Sincerely yours,
Yulia Matveeva
More information about the R-help
mailing list