[R] mgcv gam/bam model selection with random effects and AR terms

Sat Apr 8 16:24:48 CEST 2017

Would be grateful for advice on gam/bam model selection incorporating random effects and autoregressive terms.

I have a multivariate time series recorded on ~500 subjects at ~100 time points.  One of the variables (A) is the dependent and four others (B to E) are predictors.  My basic formula is:

[model 1]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E))

I've then included a random intercept and a random effect for time as the pattern of A over time is highly variable across subjects.

[model 2]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, bs='re'))

I expect there is also potential for autocorrelation within the time series. So:

[model 3]: bam(A ~ s(time)+s(B)+s(C)+s(D)+s(E)+s(id, bs='re')+s(id,time, bs='re'), AR.start = startindex, rho = 0.52)

The rho value of 0.52 was settled on by trial-and-error minimising fREML/ML (side question: am I correct in understanding that bam can only use a fixed rho rather than taking this as a value to optimise as in gamm?)

The lowest fREML or ML values are obtained by model 3 (71674 vs 72099) for model 2) but the highest adjusted R2/deviance explained is with model 2 (37.7 vs 42.1%).  Model 1 is inferior to both the others on all measures.

Is it better to select the model including the AR term given the lower ML or is it legitimate to go with the 'simpler' model 2 that has higher R2/deviance explained?

I am unable to provide a fully reproducible example as I don't know how to generate sample data with these specific characteristics.

Many thanks