[R] physical constraint with gam

Thu May 12 03:29:40 CEST 2016

Hi again,
I'm looking for some clarification on 2 things.
1. On that last note, I realize that s(x1,x2) would be the other obvious
interaction to compare with - and I see that you recommend te(x1,x2) if
they are not on the same scale.
2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when you
plot s(x1):x1, why does my function above return the same yhat as
predict(mdl,type='response') ?  Shouldn't each of the terms need to be
multiplied by the variable value before applying
rowSums()+attr(sterms,'constant') ??
Thanks again
Dominik

On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider <
Dominik.Schneider at colorado.edu> wrote:

> Hi Simon, Thanks for this explanation.
> To make sure I understand, another way of explaining the y axis in my
> original example is that it is the contribution to snowdepth relative to
> the other variables (the example only had fsca, but my actual case has a
> couple others). i.e. a negative s(fsca) of -0.5 simply means snowdepth 0.5
> units below the intercept+s(x_i), where s(x_i) could also be negative in
> the case where total snowdepth is less than the intercept value.
>
> The use of by=fsca is really useful for interpreting the marginal impact
> of the different variables. With my actual data, the term s(fsca):fsca is
> never negative, which is much more intuitive. Is it appropriate to compare
> magnitudes of e.g. s(x2):x2 / mean(x2) and s(x2):x2 / mean(x2)  where
> mean(x_i) are the mean of the actual data?
>
> Lastly, how would these two differ: s(x1,by=x2); or
> s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and i'm not
> sure if a linear combination is enough.
>
> Thanks!
> Dominik
>
>
> On Wed, May 11, 2016 at 3:11 AM, Simon Wood <simon.wood at bath.edu> wrote:
>
>> The spline having a positive value is not the same as a glm coefficient
>> having a positive value. When you plot a smooth, say s(x), that is
>> equivalent to plotting the line 'beta * x' in a GLM. It is not equivalent
>> to plotting 'beta'. The smooths in a gam are (usually) subject to
>> `sum-to-zero' identifiability constraints to avoid confounding via the
>> intercept, so they are bound to be negative over some part of the covariate
>> range. For example, if I have a model y ~ s(x) + s(z), I can't estimate the
>> mean level for s(x) and the mean level for s(z) as they are completely
>> confounded, and confounded with the model intercept term.
>>
>> I suppose that if you want to interpret the smooths as glm parameters
>> varying with the covariate they relate to then you can do, by setting the
>> model up as a varying coefficient model, using the `by' argument to 's'...
>>
>> gam(snowdepth~s(fsca,by=fsca),data=dat)
>>
>>
>> this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca)
>> is not confounded with the intercept, so no constraint is needed or
>> applied, and you can now interpret the smooth like a local GLM coefficient.
>>
>> best,
>> Simon
>>
>>
>>
>>
>> On 11/05/16 01:30, Dominik Schneider wrote:
>>
>>> Hi,
>>> Just getting into using GAM using the mgcv package. I've generated some
>>> models and extracted the splines for each of the variables and started
>>> visualizing them. I'm noticing that one of my variables is physically
>>> unrealistic.
>>>
>>> In the example below, my interpretation of the following plot is that the
>>> y-axis is basically the equivalent of a "parameter" value of a GLM; in
>>> GAM
>>> this value can change as the functional relationship changes between x
>>> and
>>> y. In my case, I am predicting snowdepth based on the fractional snow
>>> covered area. In no case will snowdepth realistically decrease for a unit
>>> increase in fsca so my question is: *Is there a way to constrain the
>>> spline
>>> to positive values? *
>>>
>>> Thanks
>>> Dominik
>>>
>>> library(mgcv)
>>> library(dplyr)
>>> library(ggplot2)
>>> extract_splines=function(mdl){
>>>    sterms=predict(mdl,type='terms')
>>>    datplot=cbind(sterms,mdl$model) %>% tbl_df
>>>    datplot$intercept=attr(sterms,'constant')
>>>    datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
>>>    return(datplot)
>>> }
>>> dat=data_frame(snowdepth=runif(100,min =
>>> 0.001,max=6.7),fsca=runif(100,0.01,.99))
>>> mdl=gam(snowdepth~s(fsca),data=dat)
>>> termdF=extract_splines(mdl)
>>> ggplot(termdF)+
>>>    geom_line(aes(x=fsca,y=`s(fsca)`))
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
>> +44 (0)117 33 18273     http://www.maths.bris.ac.uk/~sw15190
>>
>>
>

	[[alternative HTML version deleted]]