[R] logistic regression tree

Frank Harrell f.harrell at vanderbilt.edu
Sat Aug 21 01:30:38 CEST 2010



On Fri, 20 Aug 2010, Kay Cichini wrote:

>
> hello,
>
> my data-collection is not yet finished, but i though have started
> investigating possible analysis methods.
>
> below i give a very close simulation of my future data-set, however there
> might be more nominal explanatory variables - there will be no continous at
> all  (maybe some ordered nominal..).
>
> i tried several packages today, but the one i fancied most was ctree of the
> party package.
> i can't see why the given no. of datapoints (n=100) might pose a problem
> here - but please teach me better, as i might be naive..

See

http://biostat.mc.vanderbilt.edu/wiki/Main/ComplexDataJournalClub#Sebastiani_et_al_Nature_Genetics

The recursive partitioning simulation there will give you an idea - 
you can modify the R code to simulate a situation more like yours. 
When you simulate the true patterns and see how far the tree is from 
discovering the true patterns, you'll be surprised.

Frank

  >
> i'd be very glad about comments on the use of ctree on suchalike dataset and
> if i oversee possible pitfalls....
>
> thank you all,
> kay
>
> ######################################################################################
> # an example with 3 nominal explanatory variables:
> # Y is presence of a certain invasive plant species
> # introduced effect for fac1 and fac3, fac2 without effect.
> # presence with prob. 0.75 in factor combination fac1=I (say fac1 is geogr.
> region) and
> # fac3 = a|b|c (say all richer substrates).
> # presence is not influenced by fac2, which might be vegetation type, i.e.
> ######################################################################################
> library(party)
> dat<-cbind(
> expand.grid(fac1=c("I","II"),
>            fac2=LETTERS[1:5],
>            fac3=letters[1:10]))
>
> print(dat<-dat[order(dat$fac1,dat$fac2,dat$fac3),])
>
> dat$fac13<-paste(dat$fac1,dat$fac3,sep="")
> for(i in 1:nrow(dat)){
> ifelse(dat$fac13[i]=="Ia"|dat$fac13[i]=="Ib"|dat$fac13[i]=="Ic",
>       dat$Y[i]<-rbinom(1,1,0.75),
>       dat$Y[i]<-rbinom(1,1,0))
> }
> dat$Y<-as.factor(dat$Y)
>
> tr<-ctree(Y~fac1+fac2+fac3,data=dat)
> plot(tr)
> ######################################################################################
>
>
> -----
> ------------------------
> Kay Cichini
> Postgraduate student
> Institute of Botany
> Univ. of Innsbruck
> ------------------------
>
> -- 
> View this message in context: http://r.789695.n4.nabble.com/logistic-regression-tree-tp2331847p2333073.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list