[R] gam (mgcv), multiple imputation, f-stats/p-values, and summary(gam)

Wed May 8 17:59:28 CEST 2013

?summary.gam  ## The Help page

Since the Help page is presumably not enough, why don't you look at
the code?? R is open source.

summary.gam  ## at the prompt

-- Bert

On Wed, May 8, 2013 at 7:12 AM, Andrew Crane-Droesch <andrewcd at gmail.com> wrote:
> Dear All,
>
> I'm using gam for a project that involves multiple imputation, and it has
> led me to a question about how f-statistics/p-values work in gam.
> Specifically, how do the values in summary(gam) get generated?  As is made
> clear by the dumb example below, I'm manipul;ating gam objects to reflect
> the MI procedures that I'm using.  I don't trust the f-statistics/p-values
> that I'm getting, but I'd like to know how to further manipulate these
> objects to get trustworthy values.  Part of that invovles knowing how the
> output in summary(gam) gets generated, and from what elements of a gam
> object.
>
> Here is the example:
>
> library(mgcv)
> par(mfrow=c(2,2))
> impute <- function (a, a.impute){
>     ifelse (is.na(a), a.impute, a)
>     }
>
> #make some dumb fake data
> x1.knots = c(-1,-.5,0,.5,1)
> x2.knots = c(-1,-.5,0,.5,1)
> K=5
> x1 = rnorm(100)
> x2 = rnorm(100)
> y = .05*exp(x1)-.5*x1 + .1*x2 + x1*x2 + rnorm(100)
> #some of it is missing
> x1[81:100] = NA
> x2[71:90] = NA
>
> #do a few dumb imputations, and fit models to the partially-imputed data
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m1 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
> = x2.knots))
> plot(m1,plot.type="contour",scheme=2,main="Imp 1")
>
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m2 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
> = x2.knots))
> plot(m2,plot.type="contour",scheme=2,main="Imp 2")
>
> x1.imp = impute(x1, rnorm(100))
> x2.imp = impute(x2, rnorm(100))
> m3 = gam(y~te(x1.imp,x2.imp,k=c(K,K)),knots = list(x1.imp = x1.knots, x2.imp
> = x2.knots))
> plot(m3,plot.type="contour",scheme=2,main="Imp 3")
>
> results = list(m1,m2,m3)
>
> #Combine the results according to rubin's rules
> reps=3
> bhat=results[[1]]$coeff
> for (i in 2:reps){
>     bhat=bhat+results[[i]]$coeff
>     }
> bhat = bhat/reps
>
> W=results[[1]]$Vp
> for (i in 2:reps){
>     W = W+results[[i]]$Vp
>     }
> W = W/reps
> B= (results[[1]]$coeff-bhat) %*% t(results[[1]]$coeff-bhat)
> for (i in 2:reps){
>     B = B+(results[[i]]$coeff-bhat) %*% t(results[[i]]$coeff-bhat)
>     }
> B=B/(reps-1)
> Vb = W+(1+1/reps)*B
>
> #Put the results into a convenient gam object
> MI = results[[1]]
> MI$coefficients=bhat
> MI$Vp = Vb
>
> #and summarize
> summary(MI)
> plot(MI,plot.type="contour",scheme=2,main="MI")
>
> When I do something similar with non-fake data I get wacky f-statistics and
> p-values.  For example, F could be >100 and p could equal 1.  This probably
> has something to do with degrees of freedom.
>
> My easy question:  what goes on under the hood with gam to generate these
> values?  What parts of a gam object are called up?
>
> My harder question:  how might one construct principled analogs of these
> statistics in an MI context, when degrees of freedom will vary across
> models, depending on the imputed data?  Has anybody thought about this, or
> do I have have some serious pencil-and-paper ahead of me?
>
> Thanks,
> Andrew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm