[R] A question on modeling brain growth using GAM

Thu Apr 6 16:44:53 CEST 2017

>
> gamObj=gam(brainVolume~ s(correctedAge) +  s(subjIndexF, bs="re") + 
> s(subjIndexF, correctedAge, bs="re"), method="REML", data=mydata), 
> where subjIndexF is a factor for each subject. I was thrown an error 
> saying "more coefficients than data".
>
--- I'm not sure exactly  how many scans and subjects you have. The 
above model will have 10 + 2*(number of subjects ) coeffs. If that is 
more than the number of scans then gam will not handle it. Depending on 
the numbers involved you could reduce the k parameter to 
s(correctedAge), to fix the problem. (e.g. with 31 subjects and 70 scans 
s(correctedAge=8) should work).

> However, when I tried to model similar (please correct me if they are 
> not similar) things using GAMM based on description in 
> ?factor.smooth.interaction:
>  gamm1=gamm(BrainVolume~ s(correctedAge) + s(correctedAge, subjIndexF, 
> bs="fs", k=5), data=mydata)
--- It's not the same model. You now have a random smooth curve per 
subject. You can add random effects in gamm using the list form of the 
syntax for specifying random effects in lme. see ?gamm. Random 
intercepts and slopes can be added that way.

> The model ran.  When I plotted the data using plot(gamm1), I got two 
> figures: the left one is the group mean and 95%CI, which I assume is 
> the results by gamm1$gam model. The right one shows 30 lines (the 
> number of subjects in my data) fluctuating around 0, which I assume is 
> the random effects (gamm1$lme) modeled within each subject that can be 
> added onto the group mean for individual curves. Is my understanding 
> correct? If so, how can I extract these curves from gamm1$lme?
>
--- I would extract the fitted curves using predict(gamm1$gam,..., 
type="terms") supplying the factor levels and correctedAges at which you 
want to evaluate the curves.

> Many thanks!
> L
>
>
>
> On Thu, Apr 6, 2017 at 8:22 AM, Simon Wood <simon.wood at bath.edu 
> <mailto:simon.wood at bath.edu>> wrote:
>
>     If 'subjIndexF' is a factor for subject, then s(subjIndexF,
>     bs="re") will produce a random effect for subject. i.e. each
>     subject will be given its own random intercept term, which is a
>     way that repeated measures data like this are often handled.
>
>     The reason for the s(subjIndexF, bs="re") syntax is that smooths
>     can be viewed as Gaussian random effects, so simple Gaussian
>     random effects can also be viewed as (0-dimensional) smooths. In
>     general s(x,z,w,bs="re") just appends the columns of
>     model.matrix(~x:z:w-1) to the gam model matrix, and treats the
>     associated coefficients as i.i.d. Gaussian random effects with a
>     common variance (to be estimated). In principle this works with
>     any number of arguments to s(...,bs="re").
>
>     See ?random.effects (and its linked help files) in mgcv for more.
>
>     There are mechanisms for allowing random smooth curves for each
>     subject, (e.g. ?factor.smooth.interaction), but I would only use
>     these if simpler approaches really aren't adequate here.
>
>     best,
>     Simon
>
>
>
>
>     On 30/03/17 17:06, David Winsemius wrote:
>
>             On Mar 30, 2017, at 6:56 AM, Leon Lee
>             <bhamlion78 at gmail.com <mailto:bhamlion78 at gmail.com>> wrote:
>
>             David
>
>             Thank you for your reply. I apologize if I posted in the
>             wrong forum, as I really couldn't decide which forum is
>             the best place for my question and I saw similar questions
>             asked before in this forum.
>
>             I agree that a sample of ~30 subjects (70 scans in total),
>             the model can be too complicated. Based on that, I did the
>             following:
>             (1) ignored the gender effect, as we have less females
>             than males.
>             (2) corrected chronological age based on their gestational
>             age, that is, we subtracted an infant's chronological age
>             by 2 weeks, if the infant's gestational age is 38 weeks
>             instead of 40weeks.
>
>             When I ran the model with corrected age, gestational age
>             and their interactions modeled, I found the main effect of
>             gestational age and the interaction between the two are gone.
>
>             So, my final model will look something like this:
>             gamObj=gam(brainVolume~ s(correctedAge) + s(subjIndexF,
>             bs="re"), method="REML", data=mydata)
>
>             Does this look more reasonable?
>
>         I'm still having difficulty understand how a "smoothing"
>         function would be used to handle repeated measures without
>         some sort of "group-within" indicator.
>
>         I would have imagined (and this is because I have no
>         experience with using this package for repeated measures)
>         something along the lines of:
>
>           ...+s(correctedAg|subjIndexF)
>
>         I see this statement in the docs:
>
>
>         smooth.construct.re.smooth.spec {mgcv}
>
>         "gam can deal with simple independent random effects, by
>         exploiting the link between smooths and random effects to
>         treat random effects as smooths. s(x,bs="re") implements this."
>
>         But I don't see that as applying to the dependency between
>         individuals measured repeatedly. I find no examples of
>         repeated measures problems being solve by gam(). There is also
>         a note on the same page:
>
>         "Note that smooth ids are not supported for random effect
>         terms. Unlike most smooth terms, side conditions are never
>         applied to random effect terms in the event of nesting (since
>         they are identifiable without side conditions)."
>
>         When I do a search on "using gam mgcv formula mixed effects" I
>         am referred to packages 'gamm' and 'gamm4' produced by the
>         same author (Simon Wood) as pkg 'mgcv', or to package `nlme`.
>
>
>             Yes, I am relatively new to the mixed model. We originally
>             applied functional data analysis (PACE) on the data, but
>             want to see the results using a different approach. Also,
>             I couldn't find the Mixed Models list, do you mind sending
>             me a link?
>
>         This is a link to the main mailing lists page:
>
>         https://www.r-project.org/mail.html
>         <https://www.r-project.org/mail.html>
>
>         Found with a search on Google with "R mailing lists"
>
>
>             Thank you!
>             Longchuan
>
>
>             On Tue, Mar 28, 2017 at 4:28 PM, David Winsemius
>             <dwinsemius at comcast.net <mailto:dwinsemius at comcast.net>>
>             wrote:
>
>                 On Mar 28, 2017, at 9:32 AM, Leon Lee
>                 <bhamlion78 at gmail.com <mailto:bhamlion78 at gmail.com>>
>                 wrote:
>
>                 Hi, R experts
>
>                 I am new to R & GAM toolbox and would like to get
>                 inputs from you all on my
>                 models. The question I have is as follows:
>                 I have 30 subjects with each subject being scanned
>                 from one to three times
>                 in the first year of life. The brain volume from each
>                 scan was measured.
>                 The scan time was randomly distributed from birth to 1
>                 year.
>                 Each subject has different gestational age ranging
>                 from 38 to 41 weeks
>                 Each subject has chronological age from birth to 1
>                 year old
>                 Each subject has gender category.
>                 Now, I want to look at how predictors, such as
>                 subject's chronological age,
>                 gestational age and gender will explain the changes in
>                 brain volume. I also
>                 want to include interactions between gender and age,
>                 gestational and
>                 chronological age. Random effects are also included in
>                 the model to account
>                 for subject variability. My model looks like the follows:
>
>                 gam=gam(brainVolume~ s(age) + ti(age, gestationalAge)
>                 + gestationalAge +
>                 sex + s(age, by=sex) +  s(subjIndexF, bs="re"),
>                 method="REML", data=mydata)
>
>                 Are there any obvious mistakes in the model? Any
>                 suggestions will be
>                 greatly appreciated!
>
>             I'm not seeing mistakes in the syntax but I would question
>             whether 30 subjects is sufficient to adequately support
>             estimates in a a model of this complexity. I would also
>             think that the 's(age)' and 'sex' terms would get aliased
>             out in a model with "+ s(age, by=sex)". Most R regression
>             functions handle removal of over-parametrization
>             automatically.
>
>             You also have a variable number of measurements per
>             subject. I am unable to comment on the effort to account
>             for the implicit and variably measured correlation and
>             auto-correlation of values within subjects using a
>             "smooth" on subjIndexF, since that is not an approach I
>             was familiar with.  But I am getting concerned whether you
>             are also new to statistical modeling in addition to your
>             use of R and GAM being "new to you"?
>
>             (Perhaps Simon or one of the mixed-effects experts can
>             correct the gaps in my understanding of how to model
>             repeated measures in the context of small numbers of
>             subjects and irregular emasurements.)
>
>             Please read the Posting Guide and the pages of candidate
>             mailing lists. Rhelp is not really the place to go when
>             you need statistical advice. I'm not sure if this is
>             really in the center of concerns that get discussed on the
>             Mixed Models list, but to my eyes it would be a better fit
>             there.
>
>             --
>             David.
>
>                 L
>
>                        [[alternative HTML version deleted]]
>
>                 ______________________________________________
>                 R-help at r-project.org <mailto:R-help at r-project.org>
>                 mailing list -- To UNSUBSCRIBE and more, see
>                 https://stat.ethz.ch/mailman/listinfo/r-help
>                 <https://stat.ethz.ch/mailman/listinfo/r-help>
>                 PLEASE do read the posting guide
>                 http://www.R-project.org/posting-guide.html
>                 <http://www.R-project.org/posting-guide.html>
>                 and provide commented, minimal, self-contained,
>                 reproducible code.
>
>             David Winsemius
>             Alameda, CA, USA
>
>
>         David Winsemius
>         Alameda, CA, USA
>
>         ______________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org> mailing
>         list -- To UNSUBSCRIBE and more, see
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         <https://stat.ethz.ch/mailman/listinfo/r-help>
>         PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         <http://www.R-project.org/posting-guide.html>
>         and provide commented, minimal, self-contained, reproducible code.
>
>
>
>     -- 
>     Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
>     +44 (0)117 33 18273 <tel:%2B44%20%280%29117%2033%2018273>
>     http://www.maths.bris.ac.uk/~sw15190
>     <http://www.maths.bris.ac.uk/%7Esw15190>
>
>


-- 
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
+44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190


	[[alternative HTML version deleted]]