[R] Interpretation of gam intercept parameter
Gavin Simpson
gavin.simpson at ucl.ac.uk
Wed Jun 30 09:31:45 CEST 2010
On Tue, 2010-06-29 at 21:27 -0500, Lidia Dobria wrote:
> Dear All:
>
> I apologize for asking such an elementary question, but I could not
> find an adequate response on line. I am hoping to receive some help
> with the interpretation of the Intercept coefficient in the gam model
> below.
>
> I1 through I3 are dummy coded "Item difficulty" parameters in a data
> set that includes 4 items. If the Intercept is the value of Y when all
> other terms are 0, am I correct in assuming that it also equals the
> difficulty of item 4 (dummy coded 0 0 0 )?
If I understand you correctly (?) you have a single variable 'ID' (Item
Difficulty) taking three levels 1,2,3. If so, you should avoid making
your dummy variables by hand and let R's formula handling sugar take
care of this for you.
fac <- factor(sample(rep(letters[1:3], each = 4)))
resp <- rnorm(12)
num <- rnorm(12)
model.matrix(resp ~ fac + num)
(Intercept) facb facc num
1 1 0 1 1.22373197
2 1 0 0 -0.23893032
3 1 1 0 -0.03588385
4 1 1 0 0.39657910
5 1 0 1 0.14727398
6 1 1 0 0.59727570
7 1 0 1 -0.24968044
8 1 0 0 0.01444002
9 1 0 1 0.45514437
10 1 1 0 -0.74748326
11 1 0 0 0.89873549
12 1 0 0 1.37584734
attr(,"assign")
[1] 0 1 1 2
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.treatment"
With treatment contrasts the intercept is the mean for the reference
level (which is the first entry in levels(fac) ) and the facb and facc
entries above code for the difference in the mean response of the b and
c groups from the reference level mean (group a). To parametrise on the
group means, suppress the intercept
model.matrix(resp ~ 0 + fac + num) ## or
model.matrix(resp ~ fac + num - 1)
This all happens within the model fitting code, so I could run gam as:
require(mgcv)
## Probably good idea to have you data in a data frame
dat <- data.frame(fac, resp, num)
rm(resp, num, fac)
mod <- gam(resp ~ fac + s(num), data = dat)
summary(mod)
anova(mod)
Compare the summary() and anova() output; in the former the data is at
the level of each coefficient (the treatment contrast info I mention
above), whilst in anova() the single term for fac combines this
information into a single value for the entire term 'fac'.
If you want to alter the contrasts, look at ?contrasts
Does it help if you recode your model using R's tools?
G
>
> Thank you for your help.
> Lidia
>
>
> Family: gaussian
> Link function: identity
>
> Formula:
> Score ~ I1 + I2 + I3 + s(TimeI1, bs = "cr", k = 7) +
> s(TimeI2, bs = "cr", k = 7) + s(TimeI3, bs = "cr", k = 7)
>
> Parametric coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 4.70968 0.09547 49.330 < 2e-16 ***
> I1 -0.22188 0.21767 -1.019 0.308157
> I2 0.51236 0.16592 3.088 0.002042 **
> I3 -0.60697 0.18258 -3.324 0.000902 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Approximate significance of smooth terms:
> edf Ref.df F p-value
> s(TimeI1) 3.820 3.820 4.587 0.001331 **
> s(TimeI2) 2.491 2.491 6.271 0.000784 ***
> s(TimeI3) 3.481 3.481 8.997 1.54e-06 ***
> ---
>
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> R-sq.(adj) = 0.057 Scale est. = 2.131 n = 2079
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list