[R] Why are the number of coefficients varying? [mgcv][gam]

Simon Wood s.wood at bath.ac.uk
Tue Jan 29 15:21:48 CET 2013


Andrew,

I think that the weird edfs may result from an unhandled case in the 
side constraint calculation. In particular the term

s(BCAR.imp,bs="cr",k=length(BCAR.knots),by=as.factor(pot.trial))

is confounded with

te(soc.imp,BCAR.imp,k=c(4,4))

but there was an issue with picking this up properly because of the factor 'by' variable, which should have been fixed by a change in 1.7-13. If it wasn't picked up then the fitting routines would have noticed a lack of identifiability later, and dealt with it - which is how the terms can then end up having no degrees of freedom (instead of the 1 EDF that would be expected for fully penalized).

In itself this would not explain the difference in number of coefficients, however. The obvious options there are that some factor levels have got dropped for some replicates, or that for some replicates some smooth arguments do not have enough unique values to allow  the number of knots specified, in which case the number of knots will be reduced. The only way to get further, I think, is to to compare coef(mod) for a couple of models that differ in their number of coefficients, and find the labels of the coefficients that have been dropped....

best,
Simon

On 29/01/13 00:20, Andrew Crane-Droesch wrote:
> Hi Simon,
>
> Thanks for replying.
>
> On further investigation, I can't reproduce this error on my local 
> machine -- it only occurs when sending to a cluster (to run the 
> multiple imputations in parallel) that I've got access to.  I send to 
> a friend's web server, and I get the same sort of error (but a 
> different set of results!) that the cluster gave me.  The seed is set 
> identically across the three machines.  gam.check indicates 
> convergence after 16 iterations locally, but 21 iterations on both 
> remote machines.  And both remote machines give results that penalize 
> the random effects, and the first, second and fourth spline terms 
> effectively to zero (res.df ~1e-7).
>
> I then checked versions.  My local machine has mgcv 1.7.22, the 
> cluster has 1.7.3, and the server has 1.7.12.  My local machine has R 
> 2.15.1, the cluster has 2.12.2, and the server has 2.14.1. I updated 
> the server's R version, and the result was fixed.  Will see if the 
> people who manage the cluster can update the cluster.
>
> The tryCatch is there because my imputation models that feed the gams 
> are not bug-free.
>
> Thanks anyway for replying.
>
> Best,
> Andrew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list