[R] glm or transformation of the response?

Joshua Wiley jwiley.psych at gmail.com
Sat Jan 7 23:44:03 CET 2012


Hi Emily,

This is the R-help forum---it is for R questions, not basic
statistics.  You should check out http://stats.stackexchange.com/ for
those type of questions.  glm(log(y) ~ x, poisson(link = "identity"))
is not the same as glm(y ~ x, poisson(link = "log")), so I am not
surprised you are getting different results.  An identity link and
data transformations do not inherently violate assumptions.  Depending
why you have a 'bunch of zeroes' I might consider a zero inflated
model or censored regression.

For more in depth discussion, I would suggesting heading over to stack
exchange and providing more details about your data and model.

Cheers,

Josh

On Sat, Jan 7, 2012 at 8:54 AM, emily <ebell545 at gmail.com> wrote:
> Hi Dr. Snow,
>
>
>
> I am a graduate student working on analyzing data for my thesis and came
> across your post on  an R forum:
>
>
>
> The default link function for the glm poisson family is a log link, which
> means that it is fitting the model:
>
> log(mu) ~ b0 + b1 * x
>
> But the data that you generate is based on a linear link.  Therefore your
> glm analysis does not match with how the data was generated (and therefore
> should not necessarily be the best fit).  Either analyze using glm and a
> linear link, or generate the data based on a log link (e.g. rpois(40,
> exp(seq(1,3, length.out=40))) ).
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org <https://stat.ethz.ch/mailman/listinfo/r-help>
> 801.408.8111
>
>
>
> I am not using R at the moment (working in SPSS, have to love the GUI) but
> my question is quite related:
>
> I am running a generalized linear model on data highly skewed to the right
> with a bunch of zeroes, so I decided to use the Tweedie distribution. In the
> model I ran both untransformed data (with link=log) as well as log(x+1)
> transformed data (with link=identity). The latter model had a much smaller
> (more negative) AICc value than the untransformed data with link=log.
>
> Is it valid to run the GLM with log(x+1) transformed data if link=identity?
> Or am I violating some kind of assumption about the model?
>
> I really appreciate any advice or thoughts! It seems as if my go-to
> statistician has taken a loooong break and any help would be greatly valued!
>
>
>
> -Emily Bellush
>
> QTGR at IUP.EDU
>
> Indiana University of Pennsylvania
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list