[R] choosing an appropriate linear model

Fri Jun 6 00:25:37 CEST 2008

I am trying to model the observed leaching of wood preservative chemicals
from treated wood during an outdoor experiment where leaching is caused by
rainfall events.  For each rainfall event, the amount of rainfall was
recorded as well as the amount of preservative chemical leached.  A number
of climatic variables were measured, but the most important is the amount of
rainfall.

I have tried a simple linear model, with zero intercept because zero
rainfall cannot cause any leaching (leachdata dataframe is attached to this
email).  The diagnostics show clearly non-normally distributed residuals
with a simple linear regression, and I am trying to figure out what to do
about it (see attached diagnostics.png).  This dataset contains measurements
from 57 rainfall events on three replicate samples, for a total of 171
measurements.

Part of the problem is that physically, the leaching values can only be
positive, so for the smaller rainfall amounts the residuals are all
positive.  If I allow an intercept then it is significantly positive,
possibly since the researcher wouldn't have collected measurements for very
small rain events, but in terms of the model it doesn't make sense
physically to have a positive intercept, particularly since lab experiments
have shown that a certain amount of rain exposure is required to wet the
wood before leaching begins.

I can get more normally distributed residuals by log-transforming the
response, or using the optimal box-cox transformation of lambda = 0.21,
which produces nicer-looking residuals but unsatisfactory prediction which
is the main goal of the model (also attached).

Any advice on how to create a better predictive model?  I presume it has
something to do with glm, especially since I have repeated rainfalls on
replicate samples, but any advice on the approach to take would be much
appreciated.  The code I used to produce the attached plots is included
below.

leach.lm <- lm(leachate~rainmm-1,data=leachdata)

png("dianostics.png",height=1200,width=700)
par(mfrow=c(3,2))
plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
abline(leach.lm)
plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed
leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed
leaching",ylab="predicted leaching")
abline(a=0,b=1)
plot(leach.lm)
dev.off()

library(MASS)
boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))

boxtran <- function(y,lambda,inverse=F){
  if(inverse)
    return((lambda*y+1)^(1/lambda))
  else
    return((y^lambda-1)/lambda)
}

png("boxcox-dianostics.png",height=1200,width=700)
par(mfrow=c(3,2))
logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
x <- leachdata$rainmm
y <- boxtran(predict(logleach.lm),0.21,T)
xy <- cbind(x,y)[order(x),]
lines(xy)
plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs.
observed leaching amount",xlab="observed leaching",ylab="predicted
leaching")
abline(a=0,b=1)
plot(logleach.lm)
dev.off()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dianostics.png
Type: image/png
Size: 105394 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080605/52cb7cfe/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: boxcox-dianostics.png
Type: image/png
Size: 106915 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080605/52cb7cfe/attachment-0001.png>