[R] Log-transform and specifying Gamma
Ben Bolker
bbolker at gmail.com
Fri Nov 18 05:03:19 CET 2011
Peter Minting <peter_minting <at> hotmail.com> writes:
>
>
> Dear R help,
> I am trying to work out if I am justified in
> log-transforming data and specifying Gamma in the same glm.
> Does it have to be one or the other?
No, but I've never seen it done.
> I have attached an R script and the datafile to show what I mean.
> Also, I cannot find a mixed-model that allows Gamma errors
> (so I cannot find a way of including random effects).
> What should I do?
> Many thanks,
> Pete
>
>
ToadsBd<-read.table("Bd.txt",header=TRUE,
colClasses=c(rep("factor",2),rep("numeric",3),"factor"))
with(ToadsBd,table(group,site))
## 47 toads, 3 groups per site
library(ggplot2)
library(mgcv)
## plot points, add linear regressions per group/site
ggplot(ToadsBd,aes(x=startg,y=logBd,colour=group))+
geom_point()+facet_grid(.~site)+geom_smooth(method="lm")
## not much going on with startg ... PERHAPS
## similar slopes across sites?
ggplot(ToadsBd,aes(x=site:group,y=logBd,colour=site))+
geom_boxplot()+geom_point()
## I'm curious -- I thought the groups were just blocking
## factors, but maybe not? The patterns of group 1, 2, 3
## are consistent across sites ...
## take a quick look at the raw data ...
ggplot(ToadsBd,aes(x=site:group,y=Bd,colour=site))+
geom_boxplot()+geom_point()
mod1 <- lm(logBd~group*site*startg,data=ToadsBd)
summary(mod1)
oldpar <- par(mfrow=c(2,2))
plot(mod1)
par(oldpar)
## we definitely have to take care of the heteroscedasticity ...
library(MASS)
boxcox(mod1)
## square root transform ... ??
ToadsBd <- transform(ToadsBd,sqrtLogBd=sqrt(logBd))
mod2 <- lm(sqrtLogBd~group*site*startg,data=ToadsBd)
oldpar <- par(mfrow=c(2,2))
plot(mod2)
par(oldpar)
## still not perfect, but perhaps OK
library(coefplot2)
coefplot2(mod2)
mod3 <- update(mod2,.~.-group:site:startg)
coefplot2(mod3)
drop1(mod3,test="F")
mod4 <- update(mod3,.~group+site+startg)
coefplot2(mod4)
## look at results on new (transformed) scale
ggplot(ToadsBd,aes(x=site:group,y=sqrtLogBd,colour=site))+
geom_boxplot()+geom_point()
## Conclusions:
## don't mess around with random effects for only three groups in two sites
## I have done a fair amount of stepwise selection, so the p-values
## really can't be taken seriously, but it was clear from the
## beginning that there were differences among groups, which *seem*
## to be consistent among sites (which really makes me wonder what
## the groups are. (The weak effect of site might well go away
## once one took the effect of snooping into account ...)
## sqrt(log(x)) seems to be adequate to get reasonably
## homogeneous variances, but it really is a very strong transformation.
## It makes the results somewhat hard to interpret. Alternatively,
## you could just look at a nonparametric test (e.g. Kruskal-Wallis
## on site:group), but nonparametric tests make it hard to
## add lots of structure to the model
More information about the R-help
mailing list