[R] te( ) interactions and AIC model selection with GAM

Mon Jul 30 17:50:01 CEST 2012

Hello R users,

I'm working with a time-series of several years and to analyze it, I’m using
GAM smoothers from the package mgcv. I’m constructing models where
zooplankton biomass (bm) is the dependent variable and the continuous
explanatory variables are:
-time in Julian days (t), to creat a long-term linear trend
-Julian days of the year (t_year) to create an annual cycle 
- Mean temperature of Winter (temp_W), Temperature of September (temp_sept)
or Chla.
Questions:
1) To introduce a tensor product modifying the annual cycle in my model, I
tried 2 different approaches:
- a) gam ( bm ~ t + te (t_year, temp_W, temp_sept, k = c( 5,30), d= ( 1,2),
bs = c( “cc”,”cr”)), data = data)
-b) gam ( bm ~ t + te (t_year, temp_W, temp_sept, k = 5, bs = c(
“cc”,”cr”,”cr”)), data = data)
Here is my problem: when I’m using just 2 variables (e.g., t_year and
temp_W) for the tensor product, I can understand pretty well how the
interpolation works and visualize it with vis.gam() as a 3d plot or a
contour one. But with 3 variables is difficult to me to understand how it
works. Besides, I don’t which one is the proper way to construct it, a) or
b). Finally, when I plot a) or b) as vis.gam (model_name , view= c(“t_year”,
“temp_W”)), How should I interpret the plot? The effect of temp_W on the
annual cycle after considering already the effect of temp_sept or just the
individual effect of Temp_W on the annual cycle?
2) I’m trying to do a model selection using AIC criteria. I have several
questions about it:
- Should I use always the same type of smoothing basis (bs), the same type
of smoother ( e.g te) and the same dimension of the basis (k)? Example:
Option 1:
a) mod1 <- gam (bm ~ t, data = data)
b) mod2 <- gam (bm ~ te (t, k = 5, bs = “cr”), data = data)
c) mod3 <- gam (bm ~ te (t_year, k = 5, bs = “cc”), data = data)
d) mod4 <- gam (bm ~ te (t_year, temp_W, k = 5, bs = c(“cc”,”cr”)), data =
data)
e) mod5 <- gam (bm ~ te (t_year, temp_W, temp_sept, k = 5, bs =
c(“cc”,”cr”,”cr”)), data = data).
Here the limitation for k = 5, is due to mod5, I don’t use s () because in
mod4 and mod5 te () is used and finally, I always use “cr” and “cc”.
Option 2: 
a) mod1 <- gam (bm ~ t, data = data)
b) mod2 <- gam (bm ~ s (t, k = 13, bs = “cr”), data = data)
c) mod3 <- gam (bm ~ s (t_year, k = 13, bs = “cc”), data = data)
d) mod4 <- gam (bm ~ te (t_year, temp_W, k = 11, bs = c(“cc”,”cr”)), data =
data)
e) mod5 <- gam (bm ~ te (t_year, temp_W, temp_sept, k = 5, bs =
c(“cc”,”cr”,”cr”)), data = data).
I can get lower AIC for each of the models with Option 2, but are they
comparable when I use AIC criteria? Is it therefore the proper way to do it
as in Option 1? AIC (mod1, mod2, mod3, mod4, mod5).

Thank you in advance,
Best regards,
Ricardo González-Gil

--
View this message in context: http://r.789695.n4.nabble.com/te-interactions-and-AIC-model-selection-with-GAM-tp4638368.html
Sent from the R help mailing list archive at Nabble.com.