[R] stack overflow and predict()

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Nov 8 07:09:01 CET 2003


That's not a sensible thing to do.  Supply predict.rpart with a data frame 
that contains just the variables rpart selected.

R does have limits, and attempting to use 10,000 variables is hitting 
them,  But surely any statistician is aware of the dangers of selecting 
from 10000 variables on just 100 observations?

On Fri, 7 Nov 2003, Ji Zhu wrote:

> 
> Dear R users,
> 
> I'm trying to use rpart() to build a classification tree on a big dataset.
> The number of samples is n=100 and the number of variables is p=10000.
> 
> At first I stored all the data in a data.frame and got a "stack overflow"
> error; then I changed the data into a matrix and the problem disappeared.
> Now the trouble is when I try to use the predict() function, since each
> newdata is a long list with p=10000 elements, the predict() function
> doesn't recognize it and simply returns the fitted values at the training
> data (rather than the newdata).
> 
> Could anyone give me some suggestion on how to proceed?  Thank you.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list