[R] logistic regression tree

Achim Zeileis Achim.Zeileis at uibk.ac.at
Fri Aug 20 00:28:36 CEST 2010


On Thu, 19 Aug 2010, Gavin Simpson wrote:

> On Thu, 2010-08-19 at 13:42 -0700, Kay Cichini wrote:
>> hello everyone,
>>
>> i sampled 100 stands at 20 restoration sites and presence of 3 different
>> invasive plant species.
>> i came across logistic regression trees and wonder if this is suited for my
>> purpose - predicting presence of these problematic invasive plant species
>> (one by one) by a set of recorded ecological / geographical parameters.
>> i'd be glad if someone would comment on applying this mehtod to such data -
>> maybe someone could point me useful references.
>> also, i was not able to find out if there is a package implementing logistic
>> regression?
>
> Not sure what a logistic regression tree is, but a classification tree
> would be useful here: Treat each species as present (== 1) or absent (==
> 0) and try to fit a tree consisting of a set of splits in X covariates
> that minimise a suitable deviance criterion.
>
> If you want to fit all three species at once, try multivariate trees,
> but IIRC, they (in package mvpart at least) expect a count-based data
> set, i.e. the deviance criterion they used (sum of squares) is probably
> not suited to binary type data.

To add to Gavin's comments about the modeling techniques:

ctree() in package "party" supports recursive partitioning of multivariate 
responses of arbitrary types (numeric, categorical, censored, etc.).

Function mob() in the same package can also be used for partitioning based 
on logistic regressions. See the manual pages for further references.

Also the machine learning and environmentrics task views at

   http://CRAN.R-project.org/view=MachineLearning
   http://CRAN.R-project.org/view=Environmetrics

have some more pointers.
Z

> The one problem I foresee is that you only have 100 data points and even
> that number is pseudo replicated as you have multiple samples from just
> 20 "sites". Trees are unstable at the best of times and work best when
> given a lot of data. Boosting, bagging and randomForests can help but
> they again work best/well with large data sets. I suppose large will be
> relative to the signal to noise ratio in your data.
>
> Ecologically, one needs to consider what a 0 value means (an absence):
> was the invasive not present due to the environment being bad or just
> because it hasn't got there yet despite environment being good? How you
> deal with that is anybody's guess.
>
> Try the R-SIG-Ecology list for further help.
>
> G
>
>>
>> thanks in advance,
>> kay
>>
>> -----
>> ------------------------
>> Kay Cichini
>> Postgraduate student
>> Institute of Botany
>> Univ. of Innsbruck
>> ------------------------
>>
>
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list