[R] Constrained Poisson model / Bayesian Poisson model

David Winsemius dwinsemius at comcast.net
Sat Jan 23 19:49:19 CET 2016


> On Jan 23, 2016, at 9:40 AM, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> 
>> On Jan 23, 2016, at 5:24 AM, mara.pfleiderer at uni-ulm.de wrote:
>> 
>> Hi David,
>> 
>> I'm sorry. I'm not familiar with posting problems on helppages.
>> As the data I deal with is confidential I can't provide all details,
>> but I will try to be as precise as possible about my problem:
>> 
>> As I said I'm working on a Poisson regression model with a linear
>> predictor and the identity as link function.
> 
> Then that is not what most people would call a "Poisson regression model".
> 
>> In particular: I have data which consists of observations of two random variables X and Y over the period of about 3 months. I have clustered this data into hours, so I received 2088 observations of those variables and each observation represents the values of the variables in one hour. Now I assume Y to be Poisson distributed (Y_i~Poi(lambda_i)) and that the values of Y come from two different impacts, so I split Y_i into two Poisson-distributed variables Y_i1 and Y_i2. I assume that the values of X have a long-term effect (of at least one day) on the part Y_i1 and I have estimators for the parameters lambda_i2, but I have no exact values of the variables of Y_i1 and Y_i2 but only of Y_i as a whole.
>> 
>> So my model looks as follows:
>> fml=Y_i1 ~ b_1*X_i+....+b_n*X_(i-n+1) - lambda_i2 -1 with n>=24
>> That means that the last 24 (or more) values of X influence the value of Y_i1 in an additive way and I have no intercept(b_0=0).
>> Now I made a matrix whose rows represent Y_i, all of its 24 (or more) regressors for each observation of Y_i and the corresponding estimator lambda_i2.
>> Then I used glm(fml, family=poisson(link="identity"), data=matrix) and tried it for different values of n (=24,48,36,...).
> 
> I worry that might not construct the model you described above. The use of `poisson` constructs Poisson distributed _errors_ with an additive link, and to my understanding does _not_ assume Poisson distributed Y values.
> 
> 
>> But always some of the coefficients received negative values which doesnt't make sense in interpretation. (The values of X represent certain events which can only have a positive or none effect on the value of Y.)

One further thought. Since your errors are being constructed from a non-negative distribution, and the Y|X_i dependence is linear, then it seems perfectly reasonable that some of the estimates (which are on the linear scale) might be forced to be negative despite this not being completely sensible in real-life. The errors would be force to be "centered" around the linear estimate and so if the "true" value were small and the errors were larger, then that would "push" the estimate down into negative territory. Just one of the dangers in choosing a non-canonical choice of link and error distribution that you as an analyst needs to accept.


>> 
>> Now I want to use the constraint b_i >=0 in my model, but I don't know how I can do this.
>> 
>> I also thought of analyzing this model in a Bayesian way, but yet I haven't found a Bayesian version of glm() for the Poisson distribution such that I can specify the prior for the b_i on my own. (Then I could include the positivity in the prior.) Do you have a hint for me?
> 
> The R blogoshere has been heralding the arrival of the `rstanarm` package which provides a glm-like interface to an MCMC (Bayesian) estimation engine. Version 2.90 is on CRAN.

The version on CRAN is actually 2.90-1 and after running the script to install from GitHub, you can have the 2.90-2 version.

https://github.com/stan-dev/rstanarm

The github route will require that you have the needed development platform for your OS.

The install log showed an error with failure to "find template reference-stan-manual" but that did not stop compilation and there seemed to be a substantial numer of vignettes to consult, including one entitled: "stan_glm: GLMs for Count Data"

Best of luck.

> The list of authors in the description file is impressive: 
> 
> Authors at R: c(person("Jonah", "Gabry", email = "jsg2201 at columbia.edu", role = "aut"),
>             person("Trustees of", "Columbia University", role = "cph"),
>             person("R Core", "Deveopment Team", role = "cph", 
>                     comment = "R/pp_data.R, R/stan_aov.R"),
>             person("Douglas", "Bates", role = "cph", comment = "R/pp_data.R"),
>             person("Martin", "Maechler", role = "cph", comment = "R/pp_data.R"),
>             person("Ben", "Bolker", role = "cph", comment = "R/pp_data.R"),
>             person("Steve", "Walker", role = "cph", comment = "R/pp_data.R"),
>             person("Brian", "Ripley", role = "cph", 
>                    comment = "R/stan_aov.R, R/stan_polr.R"),
>             person("William", "Venables", role = "cph", comment = "R/stan_polr.R"),
>             person("Ben", "Goodrich", email = "benjamin.goodrich at columbia.edu", 
>                    role = c("cre", "aut")))
> 
>> 
>> I'm sorry if I went too much into detail now...
>> I hope you understand my point now and have some answers for me!
>> 
>> Best,
>> Mara
>> 
>> 
>> Zitat von David Winsemius <dwinsemius at comcast.net>:
>> 
>>> 
>>>> On Jan 22, 2016, at 7:01 AM, mara.pfleiderer at uni-ulm.de wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am dealing with a problem about my linear Poisson regression   model (link function=identity).
>>>> 
>>>> I am using the glm()-function which results in negative   coefficients, but a negative influence of the regressors wouldn't   make sense.
>>> 
>>> Negative coefficients merely indicate a lower relative rate. You   need to be more specific about the exactly data and model output   before you can raise our concern to a level where further comment   can be made.
>>> 
>>> 
>>>> 
>>>> (i) Is there a possibility to set constraints on the regression   parameters in glm() such that all coefficients are positive? Or is   there another function in R for which this is possible?
>>>> 
>>>> (ii) Is there a Bayesian version of the glm()-function where I can   specify the prior distribution for my regression parameters? (e.g.   a Dirichlet prior s.t. the parameters are positive)
>>>> 
>>>> All this with respect to the linear Poisson model...
>>> 
>>> As I implied above, the word "linear" means something different than   "additive" when the link is log().
>>> 
>>> --
>>> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list