[R] gamlss to predict dependent variables in [0, 1] interval (fractional variable)
Janka VANSCHOENWINKEL
janka.vanschoenwinkel at uhasselt.be
Tue Dec 27 11:20:17 CET 2016
Dear R-users,
I want to model a proportional variable bounded by [0,1] (the % of land
fertilized). A high percentage of the data contains 0s (60%), a smaller
percentage contains 1s (10%), and all the rest falls in between.
I want to compare different models with each other to see their
performance, however the model I am currently looking at is a zero-one
inflated beta model. I am using the R package gamlss for this.
However, I am having some troubles with the quite technical documentation
of the gamlss package and I don’t seem to find an answer to my questions
below:
1) model
The model below should model 3 submodels: one part that models the
probability of having y=0 versus y>0 (nu.formula), one part that models the
probability of having y=1 versus y<1 (tau.formula) and a final part that
models all the values in between.
gam<-gamlss(proportion~x1+x2,nu.formula=~ x1+x2,tau.formula=~ x1+x2,
family= BEINF, data=Alldata)
This is okay I think.
2) prediction
I would like to know now what is the predicted probability of an
observation to have y = 0 or y = 1. I predicted the probability of y = 0
with the code below, however I get values that go far beyond the [0-1]
interval. Therefore, they cannot be probabilities since these have to be in
the interval [0,1].
Alldata$fit_proportion_0<-predict(gam, what="nu", type='response')
summary(Alldata$fit_proportion_0)
Could somebody explain me how to obtain the correct probabilities because
the code above does not seem to work. I think the answer to my problem can
be find on section 10.8.2, page 215 of the following link (
http://www.gamlss.org/wp-content/uploads/2013/01/book-2010-Athens1.pdf). I
think it says that the predict function that I use gives another answer,
that I have to use in a certain formula to find the real probabilities. But
I am not sure how to make this work?
3) interpretation
Also, to be sure, I would like to know how to interpret the different
coefficients of the three models and how to use the coefficients separately
to determine. For the Nu and Tau models these should be interpreted as
log-odd ratios, right? And the model in the middle is just a normal
log-model, right?
4) validity
Finally, I do not find a lot of information on how to correctly test the
validity of this model? Do you test that for all three subparts separately?
Or is there a test to model the entire model at once?
Thank you very much for your help! I am aware of the fact that some of this
questions ar very basic.
Janka
[[alternative HTML version deleted]]
More information about the R-help
mailing list