[R] Problem extracting enough coefs from gam (mgcv package)

Tue Apr 24 11:22:57 CEST 2012

Hi Simon,

Thanks for your quick reply. I'm now running the model again with mgcv
1.7-13. This might take some time (half a day or so) as the dataset is
quite large (112,608 rows).
The call I've used was (I've simplified some variable names):

model = bam(LingDist ~ s(Lon,Lat) + VowelRatio + IsDem + WordLength +
SpBirthYear + IsAragon + SpBirthYear_IsAragon + PopCnt +
s(Word,bs="re") + s(Speaker,bs="re") + s(Word,SpBirthYear,bs="re") +
s(Word,IsAragon,bs="re") + s(Word,PopCnt,bs="re") +
s(Speaker,VowelRatio,bs="re") + s(Speaker,IsDem,bs="re") +
s(Speaker,WordLength,bs="re") + s(Word,Tourism,bs="re") +
s(Word,PopAge,bs="re")+ s(Word,PopIncome,bs="re") +
s(Word,SpEdu,bs="re") + s(Word,SpBirthYear_IsAragon,bs="re"),
data=dat)

I'll post the results w.r.t. the random slopes.

My procedure to assign labels when the number of slope estimates
equals the number of words is correct: rownames(slopes) =
unique(dat[,c("Word")])?

With kind regards,
Martijn

On 24/04/12 10:50, Simon Wood wrote:
> Martijn,
>
> It's a bit hard to know without seeing the full model structure, but
> it's possible that the issue is related to an undesirable side effect of
> the handling of identifiability constraints on smooth terms, prior to
> mgcv 1.7-13: the standard side constraint approach used for smooths
> could lead to unexpected constraints being applied to s(...,bs="re")
> terms in some cases.
>
> So, could you sent me the gam call that generates the problem, and
> perhaps try out if it still happens in 1.7-13?
>
> best,
> Simon
>
> On 23/04/12 18:26, Martijn Wieling wrote:
>> Dear useRs,
>>
>> I have used using the excellent mgcv package (version 1.7-12) to
>> create a generalized additive model (gam) including random effects -
>> represented with s(...,bs="re") - on the basis of dialect data.
>>
>> My model contains two random-effect factors (Word and Key - the latter
>> representing a speaker) and I have added both random intercepts and
>> various random slopes for these random-effect factors. There is no
>> missing data in my dataset. When I try to extract the by-word random
>> intercepts from my model, using coef(model), I find 357 values, equal
>> to the number of words in my dataset. Using coef(model) I get
>> uninformative names: s(Word,1) until s(Word,357), but I'm assuming (I
>> might be wrong though?) that I can link the labels of the words to
>> these values by obtaining the 357 labels from the original dataset:
>> unique(dat[,c("Word")])
>>
>> Unfortunately, I cannot use this procedure to label the by-word random
>> slopes, because I find a varying number of values for these (ranging
>> from 346 to 356) which is always less than 357. (The number of
>> by-speaker random slopes does equal the number of speakers, though.)
>>
>> Does anybody i) have an idea why I obtain fewer by-word random slopes
>> than words, and/or ii) how I can link the random slopes which are
>> present to the correct labels of the words?
>>
>> (I did not include the model as it is>300 MB in size, but let me know
>> if this is necessary.)
>>
>> Any help would be greatly appreciated!
>>
>> With kind regards,
>> Martijn Wieling
>> University of Groningen
>> http://www.martijnwieling.nl
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>