[R] Calculating RMSE in R from hurdle regression object

David March Morla david at imedea.uib-csic.es
Wed Mar 12 19:51:12 CET 2014


Dear Tim,

I think that in this paper you would find a suite of different metrics 
to evaluate your hurdle model:
Potts, Joanne M., and Jane Elith. "Comparing species abundance models." 
Ecological Modelling 199.2 (2006): 153-163.

Best regards,
David March

El 12/03/2014 18:55, Tim Marcella escribió:
> Hi,
>
> My data is characterized by many zeros (82%) and overdispersion. I have
> chosen to model with hurdle regression (pscl package) with a negative
> binomial distribution for the count data. In an effort to validate the
> model I would like to calculate the RMSE of the predicted vs. the observed
> values. From my reading I understand that this is the calculated on the raw
> residuals generated from the model output. This is the formula I used
>
> H1.RMSE <- sqrt(mean(H1$residuals^2))     # Where H1 is my fitted hurdle
> model
>
> I get 46.7 as the RMSE. This seems high to me based on the model results.
> Assuming my formula and my understanding of RMSE is correct (and please
> correct me if I am wrong) I question whether this is an appropriate use of
> validation for this particular structure of model. The hurdle model
> correctly predicts all of my zeros. The predictions I get from the fitted
> model are all values greater than zero. From my readings I understand that
> the predictions from the fitted hurdle model are means generated for the
> particular covariate environment based on the model coefficients. If this
> is truly the case it does not make sense to compare these means to the
> observations. This will generate large residuals (only 18% of the
> observations contain counts greater than 0, while the predicted counts all
> exceed 0). It seems like comparing apples to oranges. Other correlative
> tests (Pearson's r, Spearman's p) would seem to be comparing the mean
> predicted value for particular covariate to the observed which again is
> heavily dominated by zeros.
>
> Any tips on how best to validate hurdle models in R?
>
> Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
David March Morlà
Spatial Ecologist
Email: david at imedea.uib-csic.es

IMEDEA
Instituto Mediterraneo de Estudios Avanzados (UIB-CSIC)
C/Miquel Marquès 21, 07190 Esporles, Balearic Islands. Spain
www.imedea.uib.es

SOCIB
Balearic Islands Coastal Observing and Forecasting System
Strategic Issues and Applications for Society (SIAS Division)
Parc Bit, Naorte, Bloc A 2ºp. pta. 3, 07121 Palma de Mallorca. Spain
Tel: +034 971 43 97 64
www.socib.es

SOCIAL MEDIA
Google Scholar: http://scholar.google.es/citations?user=xABsDpAAAAAJ
Research Gate: https://www.researchgate.net/profile/David_March3/
Linked In: http://www.linkedin.com/in/dmarch




More information about the R-help mailing list