[R] mgcv pack- gam() - question about the by argument

Gustaf Granath Gustaf.Granath at ebc.uu.se
Tue Oct 28 13:48:18 CET 2008

Thanks!! The R-community is just amazing.

Two follow-up questions:

1) IF, I had sufficient amount of data. How would that interaction term 
(Site x Time) be specified? Maybe like this? (adding stuff on the 
previous example)
And I should look at the "Parametric coefficients:" for significant 
interaction terms, right?

#Add two more measurements
#Model with Site x Time - correct? Or should the Time factor be 
incorporated in the 'by' argument also (if possible?)??

2) The Parametric coefficients estimates: are they like intercepts or 
what do they represent here (when we have factor levels involved)? Does 
a significant difference translate into different means for the factor 
levels compared?



Simon Wood wrote:
> Gustaf,
> Your `by' variable use looks quite correct to me. The time issue is slightly 
> tricky with relatively few data. You could set up a factor variable for the 
> site:time interaction (i.e. one level for each combination of site and time) 
> but that would be pushing it a bit for this number of data. 
> Simon
> On Wednesday 22 October 2008 13:05, Gustaf Granath wrote:
>> R fellows,
>> I hope my questions are not too much about statistical methodology.
>> I do lab experiments where y is measured at different x, using
>> (plant)material originating from different sites (2-4). The response is
>> non-linear in a way that I have turned to gam() (mgcv package) to
>> investigate the shape of the response (maybe there are better ways). I
>> want to test if the shape/pattern of the response differ between sites
>> (i.e. should each site have its own curve). I was thinking (after I read
>> Wood's reference manual) to use the by= argument and fit one model
>> including the factor site and one without site, and then compare these
>> models with AIC (or can the anova.gam() be trusted here?). Is this a
>> correct interpretation and use of the by= argument??
>> I have also measured the same sample twice (weeks between) . What about
>> including the factor Time using the by= argument to test if the curve
>> differ between the two measurements (in a separate analysis looking each
>> site individually)? I guess this will ignore the correlation within
>> samples but maybe it can be used to get a rough idea?
>> Here is an example ( I know that sample size is low for a gam)
>> #Create data
>> y<-c(0.283,0.185,0.134,0.727,0.087,0.568,0.569,0.156,0.189,0.738,0.526,0.144
>> ,0.532,0.128,0.162,0.772,0.789,0.798,0.756,0.738,0.768,0.778,0.791,0.776,
>> 0.808,
>> 0.756,0.742,0.744,0.758,0.774,0.768,0.780,0.782,0.789,0.756,0.805,0.800,
>> 0.004,0.790,0.781,0.757,0.780,0.744,0.783,0.774)
>> x<-c(45.1,43,48.5,40,47.1,42.8,40.3,45.3,45.6,37.5,38,46.1,40.5,40.2,46.5,
>> 27.3,24.9,27.7,24.8,26.6,25.1,23.5,26.3,26.6,27.6,
>> 26.3,24.4,25.8,26.7,28,36,33.3,35.2,35.1,33.9,34.7,34.4,33.4,34.1,34.3,31,
>> 33.2,33.4,35.7,32.6) Site<-rep(1:3,each=5,3)
>> mydata<-data.frame(y=y,x=x,Site=as.factor(Site))
>> library(mgcv)
>> #Model without site
>> mod.no.site<-gam(y~s(x),data=mydata)
>> #Model including the factor Site. id=1 to get the same smothing
>> parameter at each factor level.
>> mod.with.site<-gam(y~Site+s(x,by=Site,id=1),data=mydata)
>> #AIC for the two models
>> AIC(mod.no.site,mod.with.site)
>> Thanks,
>> Gustaf Granath (phd student)
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented, minimal,
>> self-contained, reproducible code.

Gustaf Granath (PhD student)

Dept of Plant Ecology
Evolutionary Biology Centre (EBC)
Uppsala University
Villavägen 14, SE - 752 36 Uppsala, Sweden

Tel: +46 (0)18-471 76 88
Fax: 018 55 34 19
Email: Gustaf.Granath at ebc.uu.se

More information about the R-help mailing list