[R] physical constraint with gam

Simon Wood simon.wood at bath.edu
Sat May 14 09:24:36 CEST 2016


On 12/05/16 02:29, Dominik Schneider wrote:
> Hi again,
> I'm looking for some clarification on 2 things.
> 1. On that last note, I realize that s(x1,x2) would be the other 
> obvious interaction to compare with - and I see that you recommend 
> te(x1,x2) if they are not on the same scale.
- yes that's right, s(x1,x2) gives an isotropic smooth, which is usually 
only appropriate if x1 and x2 are naturally on the same scale.

> 2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when 
> you plot s(x1):x1, why does my function above return the same yhat as 
> predict(mdl,type='response') ? Shouldn't each of the terms need to be 
> multiplied by the variable value before applying 
> rowSums()+attr(sterms,'constant') ??
predict returns s(x1)*x1 (plot.gam just plots s(x1), because in general 
s(x1,by=x2) is not smooth). If you want to get s(x1) on its own you need 
to do something like this:

x2 <- x1 ## copy x1
m <- gam(y~s(x1,by=x2)) ## model implementing s(x1,by=x1) using copy of x1
predict(m,data.frame(x1=x1,x2=rep(1,length(x2))),type="terms") ## now 
predicted s(x1)*x2 = s(x1)

best,
Simon

> Thanks again
> Dominik
>
> On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider 
> <Dominik.Schneider at colorado.edu 
> <mailto:Dominik.Schneider at colorado.edu>> wrote:
>
>     Hi Simon, Thanks for this explanation.
>     To make sure I understand, another way of explaining the y axis in
>     my original example is that it is the contribution to snowdepth
>     relative to the other variables (the example only had fsca, but my
>     actual case has a couple others). i.e. a negative s(fsca) of -0.5
>     simply means snowdepth 0.5 units below the intercept+s(x_i), where
>     s(x_i) could also be negative in the case where total snowdepth is
>     less than the intercept value.
>
>     The use of by=fsca is really useful for interpreting the marginal
>     impact of the different variables. With my actual data, the term
>     s(fsca):fsca is never negative, which is much more intuitive. Is
>     it appropriate to compare magnitudes of e.g. s(x2):x2 / mean(x2)
>     and s(x2):x2 / mean(x2)  where mean(x_i) are the mean of the
>     actual data?
>
>     Lastly, how would these two differ: s(x1,by=x2); or
>     s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and
>     i'm not sure if a linear combination is enough.
>
>     Thanks!
>     Dominik
>
>
>     On Wed, May 11, 2016 at 3:11 AM, Simon Wood <simon.wood at bath.edu
>     <mailto:simon.wood at bath.edu>> wrote:
>
>         The spline having a positive value is not the same as a glm
>         coefficient having a positive value. When you plot a smooth,
>         say s(x), that is equivalent to plotting the line 'beta * x'
>         in a GLM. It is not equivalent to plotting 'beta'. The smooths
>         in a gam are (usually) subject to `sum-to-zero'
>         identifiability constraints to avoid confounding via the
>         intercept, so they are bound to be negative over some part of
>         the covariate range. For example, if I have a model y ~ s(x) +
>         s(z), I can't estimate the mean level for s(x) and the mean
>         level for s(z) as they are completely confounded, and
>         confounded with the model intercept term.
>
>         I suppose that if you want to interpret the smooths as glm
>         parameters varying with the covariate they relate to then you
>         can do, by setting the model up as a varying coefficient
>         model, using the `by' argument to 's'...
>
>         gam(snowdepth~s(fsca,by=fsca),data=dat)
>
>
>         this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' .
>         s(fsca,by=fsca) is not confounded with the intercept, so no
>         constraint is needed or applied, and you can now interpret the
>         smooth like a local GLM coefficient.
>
>         best,
>         Simon
>
>
>
>
>         On 11/05/16 01:30, Dominik Schneider wrote:
>
>             Hi,
>             Just getting into using GAM using the mgcv package. I've
>             generated some
>             models and extracted the splines for each of the variables
>             and started
>             visualizing them. I'm noticing that one of my variables is
>             physically
>             unrealistic.
>
>             In the example below, my interpretation of the following
>             plot is that the
>             y-axis is basically the equivalent of a "parameter" value
>             of a GLM; in GAM
>             this value can change as the functional relationship
>             changes between x and
>             y. In my case, I am predicting snowdepth based on the
>             fractional snow
>             covered area. In no case will snowdepth realistically
>             decrease for a unit
>             increase in fsca so my question is: *Is there a way to
>             constrain the spline
>             to positive values? *
>
>             Thanks
>             Dominik
>
>             library(mgcv)
>             library(dplyr)
>             library(ggplot2)
>             extract_splines=function(mdl){
>                sterms=predict(mdl,type='terms')
>                datplot=cbind(sterms,mdl$model) %>% tbl_df
>                datplot$intercept=attr(sterms,'constant')
>              datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
>                return(datplot)
>             }
>             dat=data_frame(snowdepth=runif(100,min =
>             0.001,max=6.7),fsca=runif(100,0.01,.99))
>             mdl=gam(snowdepth~s(fsca),data=dat)
>             termdF=extract_splines(mdl)
>             ggplot(termdF)+
>                geom_line(aes(x=fsca,y=`s(fsca)`))
>
>                     [[alternative HTML version deleted]]
>
>             ______________________________________________
>             R-help at r-project.org <mailto:R-help at r-project.org> mailing
>             list -- To UNSUBSCRIBE and more, see
>             https://stat.ethz.ch/mailman/listinfo/r-help
>             PLEASE do read the posting guide
>             http://www.R-project.org/posting-guide.html
>             and provide commented, minimal, self-contained,
>             reproducible code.
>
>
>
>         -- 
>         Simon Wood, School of Mathematics, University of Bristol BS8
>         1TW UK
>         +44 (0)117 33 18273 <tel:%2B44%20%280%29117%2033%2018273>
>         http://www.maths.bris.ac.uk/~sw15190
>         <http://www.maths.bris.ac.uk/%7Esw15190>
>
>
>


-- 
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
+44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190


	[[alternative HTML version deleted]]



More information about the R-help mailing list