[R] GAM selection error msgs (mgcv & gam packages)

Simon Wood sw283 at maths.bath.ac.uk
Wed Jun 21 18:09:43 CEST 2006


>
> My question concerns 2 error messages; one in the gam package and one in
> the mgcv package (see below). I have read help files and Chambers and
> Hastie book but am failing to understand how I can solve this problem.
> Could you please tell me what I must adjust so that the command does not
> generate error message?
>
> I am trying to achieve model selection for a GAM which is required for
> prediction purposes, thus my focus is on AIC. My data set has 3038 records
> and 116 predictor variables and a binary response variable [0 or 1]. There
> is no current understanding of the predictors' relationship to response so
> I am relying on GAM for selection of appropriate predictors.

- I have some worries about using a GAM in this sort of situation - it 
seems like an odd model to start from to me: you don't know the 
relationship to the covariates, but do know that it should be additive? Is 
that really true? If it is then it may still be alot to ask of the model 
selection methods to find a good model. (I'd certainly consider upping 
the `gamma' parameter in mgcv:::gam).

- General uneasiness apart, the specific warning message relates to the 
number of distinct covariate values that you have (or number of distinct 
X,Y,Z triplets). Do any of the covariates for single smooths have fewer 
than 10 distinct values? There are more than 50 distinct x,y,z triplets, I 
suppose? If you have distinct fewer covariate points for a smooth than the 
default k (10), then you need to reduce k to the number of distinct 
points, or fewer.

- Finally, for speed reasons, I'd use the "cr" basis (see ?s) if doing 
this.

best,
Simon

>- Simon Wood, Mathematical Sciences, University of Bath, Bath BA2 7AY 
>-             +44 (0)1225 386603         www.maths.bath.ac.uk/~sw283/


>
> Thanks
> Savrina
>
> *mgcv package 1.3-12:
>
> # I start with specifying the full model with 116 predictors including
> isotropic smooth of 3D location variables (when I specify only the first
> 14 predictors I get no error message)
>>
> m0<-gam(label~s(x,y,z,k=50),s+(feature4)+s(feature5)+s(feature6)+...+s(feature116),data=k.data,
> family=binomial)
>
> Error in smooth.construct.tp.smooth.spec(object, data, knots):
>     A term has fewer unique covariate combinations than specified maximum
> degrees of freedom
>
> # I was going to follow this with backwards selection by hypothesis testing
> (remove highest p-val term one at a time) and also AIC comparison of all
> the models
>
>> From help file entitled 'Generalised additive models with integrated
> smoothness estimation' I calculated the following where do I go from here?
> A) "k is the basis dimension of a given term...if k is not specified
> k=10*3^(d-1) where 'd' is the number of covariates for this term"
> My calculations: for all my terms but the first d=1 thus k=10*3^0=10.
> B) "You must have more unique combinations of covariates than the model has
> total parameters"
> My calculations: total parameters = sum of basis dimensions(50+10*113) +
> sum of non-spline terms(0) - number of spline terms(114) = 1066
>
> *gam package:
> I think stepwise selection provided by gam package would be useful in
> finding the best predictive model. I follow example on pg 283 from
> 'Statistical models in S' Chambers and Hastie 1993.
> # I start with a full model where all predictors enter linearly
>> k.start<-gam(label~., data=k.data, family=binomial)
>
> # set up scope list with possibilities for each term eg .~1 + x + s(x)
> # ignore the first column of the data set
>> k.scope<-gam.scope(k.data[,-1])
>
> # start step wise selection
>> k.step<-step(k.start,k.scope)
> #condensed output
> Start: AIC=1549.48
> label~s+y+z+feature4+feature5+...+feature116
>                Df     Deviance       AIC
> <none>                 1319.5         1549.5
> - feature54     -1     1319.2         1551.2
> - feature26     -1     1319.2         1551.2
> ...
> -feature12      -1     1357.4         1589.4
> There were 50 or more warnings (use warnings() to see the first 50)
>
> # all 50 warnings are the same
>> warnings()
> Warning messages:
> 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x[, jj,
> drop = FALSE], y, wt, offset = object$offset,   ...
>
> # it seems to not get passed the orginal linear model. It should show all
> the steps taken to the final model
>> k.step$anova
>  Step Df Deviance Resid. Df Resid. Dev      AIC
> 1      NA       NA      2922   1317.599 1549.599
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>



More information about the R-help mailing list