[R] Strange error returned or bug in gam in mgcv????

Simon Wood s.wood at bath.ac.uk
Wed Sep 2 13:25:36 CEST 2009


I'm afraid that mgcv:gam can't cope with this size of data set with this 
complexity of model. The model matrix alone for your first model would 
require around 3 terabytes of storage. For the simplest additive model the 
model matrix is `only' 1.6 Gb, but that's before you do anything with it....  

For the *generalized* additive model case, any method that does smoothness 
selection requires several times the storage of the model matrix. Currently 
`mgcv' is not very economical with storage (especially on the R side), and 
there is some room for improvement, but not enough roon to get anywhere close 
to the size of the problem that you are looking at. I'm investigating methods 
for reducing the memory requirement, but 2 million observations looks like a 
bit of a stretch, at present.

That said, methods already exist for the additive model that you want to fit. 
See e.g. 
http://www.maths.bath.ac.uk/~sw283/talks/huge.pdf
... the key point is that there is no need to ever form the model matrix 
explicitly in the purely additive case. But unfortunately there is no easy to 
use code for this, as yet....

best,
Simon
 

On Tuesday 01 September 2009 17:55, Corrado wrote:
> Dear Simon,
>
> I have stored all information at the link:
>
> http://scsys.co.uk:8002/33309?hl=on&submit=Format+it!
>
> I have the same problem if I do
> s(PC1)  + ..... + s(PC10) or
> s(Pc1,PC2,PC3,PC4,PC5)+s(PC6,PC7,PC8,PC9,PC10) or
> s(PC1,PC2,PC3,PC6,PC7,PC8) .....
>
> I have renamed PC1.1,PC2.1,PC3.1,PC4.1,PC5.1 to PC6,PC7,PC8,PC9,PC10 for
> simplicity.
>
> Regards
>
> On Tuesday 01 September 2009 17:31:04 Simon Wood wrote:
> > The basic problem is that you have requested a 10 dimensional thin plate
> > spline, with a basis dimension of 196830. In reality it will not be
> > possible to compute this, even if you have more than 196830 data. In any
> > case it would be unlikely to provide a very useful model --- the
> > "simplest" function that it can theoretically represent will have 3003
> > degrees of freedom.
> >
> > That said the error message is obviously rather unhelpful... Can you tell
> > me how many data you are actually trying to fit, and I'll try and track
> > down exactly where it's failing, and put in a more informative message.
> >
> > best,
> > Simon
> >
> > On Tuesday 01 September 2009 14:51, Corrado wrote:
> > > Dear friends,
> > >
> > > what is this error message in gam???? I cannot understand what it means
> > > .... is it a bug?
> > >
> > > gam_bray_scot24_pc_0505<gam(bray~s(PC1,PC2,PC3,PC4,PC5,
> > > PC1.1,PC2.1,PC3.1,PC4.1,PC5.1),data=dist_scot24_vector_with_climate)
> > >
> > > Error in if (length(data) != vl) { :
> > >   missing value where TRUE/FALSE needed
> > > Calls: gam ... smooth.construct -> smooth.construct.tp.smooth.spec ->
> > > array In addition: Warning message:
> > > In array(0, n * k) : NAs introduced by coercion
> > > Execution halted
> > >
> > > Thanks in advance,
> > >
> > > Best regards

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283




More information about the R-help mailing list