[R] A question about using “by” in GAM model fitting of interaction between smooth terms and factor

Wed May 6 09:50:45 CEST 2009

The problem here is that the help page you are looking at appears to be from 
an earlier version of `mgcv' than you are using (it's from a version that did 
not support factor `by' variables). Take a look at ?gam.models for the 
version that you are actually using. 

The reason that your models give the same fit is because ~z and ~z-1 differ 
only in the identifiability constraints used, when `z' is a factor (for all 
linear type models). 

As far as model reasonableness is concerned: it's a bit difficult to say 
without knowing the context. The only thing that stands out is that you are 
using an isotropic `s' term for the interaction --- this is fine if `byear' 
and `FAFR' are really naturally on the same scale, but if not tensor product 
smooths (`te') may be preferable, as the are independent of the relative 
scaling of the variables. For plot interpretability, I'd drop the `main 
effect' smooths and just leave in the interaction.  

best,
Simon 

On Tuesday 05 May 2009 16:53, willow1980 wrote:
> I am a little bit confusing about the following help message on how to fit
> a GAM model with interaction between factor and smooth terms from
> http://rss.acs.unt.edu/Rdoc/library/mgcv/html/gam.models.html:
> “Sometimes models of the form:
> E(y)=b0+f(x)z
> need to be estimated (where f is a smooth function, as usual.) The
> appropriate formula is:
> y~z+s(x,by=z)
> - the by argument ensures that the smooth function gets multiplied by
> covariate z, but GAM smooths are centred (average value zero), so the z+
> term is needed as well (f is being represented by a constant plus a centred
> smooth). If we'd wanted:
> E(y)=f(x)z
> then the appropriate formula would be: y~z+s(x,by=z)-1.”
> When I tried two scripts, I found they gave the same results. That is, the
> codes “y~z+s(x,by=z)” and “y~z+s(x,by=z)-1” gave the same results. The
> following is my result:
> ###########################################################################
> “anova(model1,model2,test="Chisq")
> Analysis of Deviance Table
>
> Model 1: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR,
>     byear, by = SES)
> Model 2: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR,
>     byear, by = SES) - 1
>    Resid. Df Resid. Dev         Df  Deviance P(>|Chi|)
> 1 1.2076e+03     1458.4
> 2 1.2076e+03     1458.4 1.9099e-11 5.030e-10 2.074e-10”
> ###########################################################################
> Is this in conflict with above statement that “If we'd wanted: E(y)=f(x)z
> then the appropriate formula would be: y~z+s(x,by=z)-1.”? Also, if you are
> familiar with GAM modelling, please have a look at my modelling process.
> That is, I want to study how one factor together with two smooth terms will
> influence the response. In model2, I also fitted the interaction between
> two smooth terms, together with the interaction of this interaction with
> factor. Is model 2 reasonable? I find it is rather complicated to interpret
> the plot of model 2.
> Thank you very much for helping!

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283