[R] (gam) formula: Why different results for terms being factor vs. numeric?

Bert Gunter gunter.berton at gene.com
Tue Oct 29 21:31:00 CET 2013


Think about it. How can one define a smooth term with a factor???

Further discussion is probably offtopic. Post on
stats.stackexchange.com if it still isn't obvious.

Cheers,
Bert

On Tue, Oct 29, 2013 at 1:16 PM, Marius Hofert
<marius.hofert at math.ethz.ch> wrote:
> Dear expeRts,
>
> If I specify group = as.factor(rep(1:2, each=n)) in the below
> definition of dat, I get the expected behavior I am looking for. I
> wonder why I
> don't get it if group is *not* a factor... My guess was that,
> internally, factors are treated as natural numbers (and this indeed
> seems to be true if you convert the latter to factors [essentially
> meaning changing the levels]), but replacing factors by numeric values
> (as below) does not provide the same answer.
>
> Cheers,
> Marius
>
>
> require(mgcv)
>
> n <- 10
> yrs <- 2000+seq_len(n)
> set.seed(271)
> dat <- data.frame(year  = rep(yrs, 2),
>                   group = rep(1:2, each=n), # *not* a factor
> (as.factor() provides the expected behavior)
>                   resp  = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n)))
> fit3 <- gam(resp ~ year + group - 1, data=dat)
> plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp),
>      xlab="Year", ylab="Response") # fit group A; mean over all
> responses in this group
> lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group
> B; mean over all responses in this group
> points(yrs, dat$resp[seq_len(n)]) # actual response group A
> points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B
> ## => hmmm... because it is not a factor (?), this does not give an
> expected answer,
> ##    but gam() still correctly figures out that there are two groups
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374



More information about the R-help mailing list