[R] goodness of "prediction" using a model (lm, glm, gam, brt, regression tree .... )

Kingsford Jones kingsfordjones at gmail.com
Thu Sep 3 16:06:14 CEST 2009


There are many ways to measure prediction quality, and what you choose
depends on the data and your goals.  A common measure for a
quantitative response is mean squared error (i.e. 1/n * sum((observed
- predicted)^2)) which incorporates bias and variance.  Common terms
for what you are looking for are "test error" and "generalization
error".


hth,
Kingsford



On Wed, Sep 2, 2009 at 11:56 PM, Corrado<ct529 at york.ac.uk> wrote:
> Dear R-friends,
>
> How do you test the goodness of prediction of a model, when you predict on a
> set of data DIFFERENT from the training set?
>
> I explain myself: you train your model M (e.g. glm,gam,regression tree, brt)
> on a set of data A with a response variable Y. You then predict the value of
> that same response variable Y on a different set of data B (e.g. predict.glm,
> predict.gam and so on). Dataset A and dataset B are different in the sense that
> they contain the same variable, for example temperature, measured in different
> sites, or on a different interval (e.g. B is a subinterval of A for
> interpolation, or a different interval for extrapolation). If you have the
> measured values for Y on the new interval, i.e. B, how do you measure how good
> is the prediction, that is how well model fits the Y on B (that is, how well
> does it predict)?
>
> In other words:
>
> Y~T,data=A for training
> Y~T,data=B for predicting
>
> I have devised a couple of method based around 1) standard deviation 2) R^2,
> but I am unhappy with them.
>
> Regards
> --
> Corrado Topi
>
> Global Climate Change & Biodiversity Indicators
> Area 18,Department of Biology
> University of York, York, YO10 5YW, UK
> Phone: + 44 (0) 1904 328645, E-mail: ct529 at york.ac.uk
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list