[R] something wrong when using pspline in clogit?

Wed Jan 22 12:15:08 CET 2003

Dear R users:

I am not entirely convinced that clogit gives me the correct result when I 
use pspline() and maybe you could help correct me here.

When I add a constant to my covariate I expect only the intercept to change, 
but not the coefficients. This is true (in clogit) when I assume a linear in 
the logit model, but the same does not happen when I use pspline().

If I did something similar to what "na.gam.replace" in S+ does, the pspline 
coefficients are the same, leading me to believe that "na.gam.replace" is 
doing the right thing (I may be wrong).
The strange thing here is that even the intercepts are the same. Lastly, 
these coefficients are the same as using clogit before adding the constant, 
but the intercept are different.

I have attached my code for all this, maybe I messed it somewhere.

Vumani

##########################
xvar<-rnorm(100,0,1)
data.mult<-rbinom(100,size=3,prob=(exp(0.5+0.4*xvar)/(1+exp(0.5+0.4*xvar))))
library(Nnet)
mult.fit<-multinom(data.mult~xvar)
coef(mult.fit)
library(survival)
choice<-c(ifelse(data.mult==0,1,0),ifelse(data.mult==1,1,0),ifelse(data.mult==2,1,0),ifelse(data.mult==3,1,0))
temp<-list(time=2-choice,
            status=choice,

intercept1=c(rep(0,length(data.mult)),rep(1,length(data.mult)),rep(0,length(data.mult)),rep(0,length(data.mult))),

intercept2=c(rep(0,length(data.mult)),rep(0,length(data.mult)),rep(1,length(data.mult)),rep(0,length(data.mult))),

intercept3=c(rep(0,length(data.mult)),rep(0,length(data.mult)),rep(0,length(data.mult)),rep(1,length(data.mult))),

xvar1=c(rep(0,length(data.mult)),rep(1,length(data.mult)),rep(0,length(data.mult)),rep(0,length(data.mult)))*rep(xvar,4),

xvar2=c(rep(0,length(data.mult)),rep(0,length(data.mult)),rep(1,length(data.mult)),rep(0,length(data.mult)))*rep(xvar,4),

xvar3=c(rep(0,length(data.mult)),rep(0,length(data.mult)),rep(0,length(data.mult)),rep(1,length(data.mult)))*rep(xvar,4),
            stratum=rep(1:length(data.mult),4))
# Used to check whether my data matrix is setup correctly
# This fits a category one baseline category logit model
# Compare with multinom
totalmodel<-clogit(status~intercept1+intercept2+intercept3+xvar1+xvar2+xvar3+strata(stratum),method="exact",temp)
coef(totalmodel)
# Use smooth term, "pspline"
totalmodel<-clogit(status~intercept1+intercept2+intercept3+pspline(xvar1)+pspline(xvar2)+pspline(xvar3)+strata(stratum),method="exact",temp)
coef(totalmodel)
# Collect indices where variable is not equal to zero
indices<-list(var1=NULL,var2=NULL,var3=NULL)
indices$var1<-temp$xvar1==0
indices$var2<-temp$xvar2==0
indices$var3<-temp$xvar3==0
# Add 10 to covariate
temp$xvar1[!indices$var1]<-xvar+10
temp$xvar2[!indices$var2]<-xvar+10
temp$xvar3[!indices$var3]<-xvar+10
# First fit linear-in-the-logit model, to check what happens to coefficients
totalmodel<-clogit(status~intercept1+intercept2+intercept3+xvar1+xvar2+xvar3+strata(stratum),method="exact",temp)
coef(totalmodel)
# Fit a smooth model, and compare with smooth model above
totalmodel<-clogit(status~intercept1+intercept2+intercept3+pspline(xvar1)+pspline(xvar2)+pspline(xvar3)+strata(stratum),method="exact",temp)
coef(totalmodel)
# Doing something similar to what na.gam.replace() in S+ does
temp$xvar1[indices$var1]<-mean(xvar+10)
temp$xvar2[indices$var2]<-mean(xvar+10)
temp$xvar3[indices$var3]<-mean(xvar+10)
# Fit spline model
totalmodel<-clogit(status~intercept1+intercept2+intercept3+pspline(xvar1)+pspline(xvar2)+pspline(xvar3)+strata(stratum),method="exact",temp)
coef(totalmodel)
# Remove 10 and substitute with mean
# Covariate are like in the beginning
temp$xvar1[!indices$var1]<-xvar
temp$xvar2[!indices$var2]<-xvar
temp$xvar3[!indices$var3]<-xvar
# Doing something like na.gam.replace()
temp$xvar1[indices$var1]<-mean(xvar)
temp$xvar2[indices$var2]<-mean(xvar)
temp$xvar3[indices$var3]<-mean(xvar)
totalmodel<-clogit(status~intercept1+intercept2+intercept3+pspline(xvar1)+pspline(xvar2)+pspline(xvar3)+strata(stratum),method="exact",temp)
coef(totalmodel)

>From: ripley at stats.ox.ac.uk
>To: Vumani Dlamini <dvumani at hotmail.com>
>CC: R-help at stat.math.ethz.ch
>Subject: Re: [R] R analogue
>Date: Mon, 20 Jan 2003 13:17:02 +0000 (GMT)
>
>On Mon, 20 Jan 2003, Vumani Dlamini wrote:
>
> > Is there any R analogue for the S+ function "na.gam.replace".
>
>No, for it is tailored for use by S's gam.
>
>Some of the things it does are positively undesirable!  It uses mean
>imputation for continuous variables, but for factors it makes NA into
>another level, which silently assumes that all missing values are similar
>and that they are going to occur with sufficient frequency in the
>training data (and they may not occur at all).
>
> > I would like
> > to make an interaction of a categorical and smooth continuous covariate.
>
>You can do that without na.gam.replace: there is an example in the MASS
>scripts for the low-birth-weight data.
>
>
>--
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595