[R] A question on modeling brain growth using GAM

Thu Apr 6 14:22:05 CEST 2017

If 'subjIndexF' is a factor for subject, then s(subjIndexF, bs="re") 
will produce a random effect for subject. i.e. each subject will be 
given its own random intercept term, which is a way that repeated 
measures data like this are often handled.

The reason for the s(subjIndexF, bs="re") syntax is that smooths can be 
viewed as Gaussian random effects, so simple Gaussian random effects can 
also be viewed as (0-dimensional) smooths. In general s(x,z,w,bs="re") 
just appends the columns of model.matrix(~x:z:w-1) to the gam model 
matrix, and treats the associated coefficients as i.i.d. Gaussian random 
effects with a common variance (to be estimated). In principle this 
works with any number of arguments to s(...,bs="re").

See ?random.effects (and its linked help files) in mgcv for more.

There are mechanisms for allowing random smooth curves for each subject, 
(e.g. ?factor.smooth.interaction), but I would only use these if simpler 
approaches really aren't adequate here.

best,
Simon

On 30/03/17 17:06, David Winsemius wrote:
>> On Mar 30, 2017, at 6:56 AM, Leon Lee <bhamlion78 at gmail.com> wrote:
>>
>> David
>>
>> Thank you for your reply. I apologize if I posted in the wrong forum, as I really couldn't decide which forum is the best place for my question and I saw similar questions asked before in this forum.
>>
>> I agree that a sample of ~30 subjects (70 scans in total), the model can be too complicated. Based on that, I did the following:
>> (1) ignored the gender effect, as we have less females than males.
>> (2) corrected chronological age based on their gestational age, that is, we subtracted an infant's chronological age by 2 weeks, if the infant's gestational age is 38 weeks instead of 40weeks.
>>
>> When I ran the model with corrected age, gestational age and their interactions modeled, I found the main effect of gestational age and the interaction between the two are gone.
>>
>> So, my final model will look something like this:
>> gamObj=gam(brainVolume~ s(correctedAge) +  s(subjIndexF, bs="re"), method="REML", data=mydata)
>>
>> Does this look more reasonable?
> I'm still having difficulty understand how a "smoothing" function would be used to handle repeated measures without some sort of "group-within" indicator.
>
> I would have imagined (and this is because I have no experience with using this package for repeated measures) something along the lines of:
>
>   ...+s(correctedAg|subjIndexF)
>
> I see this statement in the docs:
>
>
> smooth.construct.re.smooth.spec {mgcv}	
>
> "gam can deal with simple independent random effects, by exploiting the link between smooths and random effects to treat random effects as smooths. s(x,bs="re") implements this."
>
> But I don't see that as applying to the dependency between individuals measured repeatedly. I find no examples of repeated measures problems being solve by gam(). There is also a note on the same page:
>
> "Note that smooth ids are not supported for random effect terms. Unlike most smooth terms, side conditions are never applied to random effect terms in the event of nesting (since they are identifiable without side conditions)."
>
> When I do a search on "using gam mgcv formula mixed effects" I am referred to packages 'gamm' and 'gamm4' produced by the same author (Simon Wood) as pkg 'mgcv', or to package `nlme`.
>
>
>> Yes, I am relatively new to the mixed model. We originally applied functional data analysis (PACE) on the data, but want to see the results using a different approach. Also, I couldn't find the Mixed Models list, do you mind sending me a link?
> This is a link to the main mailing lists page:
>
> https://www.r-project.org/mail.html
>
> Found with a search on Google with "R mailing lists"
>
>
>> Thank you!
>> Longchuan
>>
>>
>> On Tue, Mar 28, 2017 at 4:28 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>>> On Mar 28, 2017, at 9:32 AM, Leon Lee <bhamlion78 at gmail.com> wrote:
>>>
>>> Hi, R experts
>>>
>>> I am new to R & GAM toolbox and would like to get inputs from you all on my
>>> models. The question I have is as follows:
>>> I have 30 subjects with each subject being scanned from one to three times
>>> in the first year of life. The brain volume from each scan was measured.
>>> The scan time was randomly distributed from birth to 1 year.
>>> Each subject has different gestational age ranging from 38 to 41 weeks
>>> Each subject has chronological age from birth to 1 year old
>>> Each subject has gender category.
>>> Now, I want to look at how predictors, such as subject's chronological age,
>>> gestational age and gender will explain the changes in brain volume. I also
>>> want to include interactions between gender and age, gestational and
>>> chronological age. Random effects are also included in the model to account
>>> for subject variability. My model looks like the follows:
>>>
>>> gam=gam(brainVolume~ s(age) + ti(age, gestationalAge) + gestationalAge +
>>> sex + s(age, by=sex) +  s(subjIndexF, bs="re"), method="REML", data=mydata)
>>>
>>> Are there any obvious mistakes in the model? Any suggestions will be
>>> greatly appreciated!
>> I'm not seeing mistakes in the syntax but I would question whether 30 subjects is sufficient to adequately support estimates in a a model of this complexity. I would also think that the 's(age)' and 'sex' terms would get aliased out in a model with "+ s(age, by=sex)". Most R regression functions handle removal of over-parametrization automatically.
>>
>> You also have a variable number of measurements per subject. I am unable to comment on the effort to account for the implicit and variably measured correlation and auto-correlation of values within subjects using a "smooth" on subjIndexF, since that is not an approach I was familiar with.  But I am getting concerned whether you are also new to statistical modeling in addition to your use of R and GAM being "new to you"?
>>
>> (Perhaps Simon or one of the mixed-effects experts can correct the gaps in my understanding of how to model repeated measures in the context of small numbers of subjects and irregular emasurements.)
>>
>> Please read the Posting Guide and the pages of candidate mailing lists. Rhelp is not really the place to go when you need statistical advice. I'm not sure if this is really in the center of concerns that get discussed on the Mixed Models list, but to my eyes it would be a better fit there.
>>
>> --
>> David.
>>> L
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> David Winsemius
>> Alameda, CA, USA
>>
>>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
+44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190