[R] Bootstrap tree selection in rpart

Fiona Callaghan fmc2+ at pitt.edu
Thu Sep 13 16:30:37 CEST 2007


Thanks very much for replying -- just one final question:  does this hold
when the outcome is continuous (and not discrete) e.g instead of the
outcome being multinomial we have a continuous outcome like residuals?

Thanks again
Fiona
> Fiona Callaghan asked about using the bootstrap  instead of
> cross-validation in
> the tree pruning step.
>    It turns out that cross-validation works better than the bootstrap for
> trees.
> The issue is a subtle one.  The bootstrap can be thought of as 2 steps.
>
> 1.  Deduction: Evaluate the behavior of some statistic "zed" under
> repeated
> sampling from the discrete distribution F-hat, i.e., the original data.
> This
> gives a direct evaluation of how zed behaves under F-hat.
>
> 2. Induction: Assume that (behavior of zed under sampling from F) =
> (behavior
> under sampling from F-hat).
>
>   It turns out that trees behave differently under discreet distributions
> than
> they do under continuous ones, so step 2 fails.  Essentially, there are
> fewer
> places to split in the discrete case, tree creation is less noisy, and the
> bootstrap gives an overoptimistic view.  I remember Brad Efron giving a
> talk on
> this long ago (I was still a student!), so the details are fuzzy; I think
> that
> he solved it by sampling from a smoothed version of the empirical CDF.
>
>    Terry Therneau
>


-- 
Fiona Callaghan, MA MS
A432 Crabtree Hall
Department of Biostatistics
Graduate School of Public Health
University of Pittsburgh
Phone 412 624 3063



More information about the R-help mailing list