[R] survival::clogit, how to extract residuals for GOF assessment

Tue Apr 26 20:04:26 CEST 2016

Hi Folks,

Hopefully this question has enough R and not too much stats to be
appropriate for this list. Based on,* Hosmer et al. 2013. Logistic
regression for matched case-control studies. Applied Logistic
Regression *(eqtn.
7.8)*, *I am assessing GOF of conditional (or matched) logistic regression
models with the *standardized Pearson residuals*. The authors define
“large” as delta chi-squared values > 4.

For a 1:1 study, I can fit a conditional logistic model via an
unconditional routine by making the response variable all 1’s, taking the
difference of the covariate values for each pair, and removing the
intercept. I can then extract the standardized residuals from the model
(see code at bottom for example). However, if I want to fit a 1:many model,
I need to use survival::clogit, which is where my question comes in:

How can I apply this same "test" to a clogit model? Which residuals should
I be extracting? Or, is this not an option for a clogit model?

## The default residuals of coxph in R are the martingale residuals.

## resid(fit1,type=c("martingale", "deviance", "score", "schoenfeld",

##                   "dfbeta", "dfbetas", "scaledsch","partial"))

R code below shows equivalence between clogit and binomial GLM fit on the
differences (note: these would not be equivalent if used a "cluster"
argument in clogit), and GOF "test" for binomial GLM fit on the
differences. I would like to assess GOF of the fit.clogit model but do not
know how.

require(survival)

require(plyr)

# Dataframe

dat.full <- structure(list(resp = c(1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,

0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,

0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,

0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,

1L, 0L), x1 = c(3.92, 2.04, 2.27, 9.25, 6.13, 10.44, 5.09, 1.27,

5.9, 2.88, 3.79, 1.46, 3.35, 3.82, 4.28, 6.52, 3.54, 3.46, 0.46,

1.68, 3.22, 1.64, -0.12, 2.78, 2.58, 2, 2.83, 3.58, 1.45, 1.64,

2.89, 3.12, 5.6, 8.29, 3.42, 4.8, 3.04, 4.33, 5.31, 1.78, 8.18,

4.56, 4.85, 7.99, 7.52, 6.85, 7.64, 3.33, 5.17, 4.62, 1.24, 2.54,

3.08, 8.2, 1.81, 2.78, 2.16, 2.76, 3.45, 3.43), ID = c(10L, 10L,

11L, 11L, 13L, 13L, 17L, 17L, 18L, 18L, 23L, 23L, 25L, 25L, 29L,

29L, 31L, 31L, 33L, 33L, 35L, 35L, 38L, 38L, 39L, 39L, 4L, 4L,

41L, 41L, 43L, 43L, 45L, 45L, 46L, 46L, 49L, 49L, 50L, 50L, 54L,

54L, 55L, 55L, 56L, 56L, 57L, 57L, 59L, 59L, 6L, 6L, 60L, 60L,

7L, 7L, 8L, 8L, 9L, 9L)), .Names = c("resp", "x1", "ID"), row.names = c(1L,

2L, 3L, 4L, 7L, 8L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 21L,

22L, 23L, 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L,

37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 49L, 50L, 55L,

56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L,

71L, 72L, 73L, 74L, 75L, 76L), class = "data.frame")

# fit clogit

fit.clogit <- clogit(resp ~ x1 + strata(ID), method = "efron", data =
dat.full)

summary(fit.clogit)

# Make dataframe so can fit on differences with unconditional routine.

# x1 where resp = 1 minus x1 where resp = 0, grouped by ID

dat.diff <- ddply(dat.full, .(ID), summarise,

      x1.diff = x1[resp == 1] - x1[resp == 0])

dat.diff$resp <- 1 # response variable is all 1's

# Fit on differences and 1's, remove intercept

fit.diff <- glm(resp ~ -1 + x1.diff, family = binomial, data = dat.diff)

summary(fit.diff)

summary(fit.clogit)$coefficients

# GOF: delta chi-squared

plot(fit.diff$fitted.values, rstandard(fit.diff)^2)

round(sort(rstandard(fit.diff)^2), 4)

Thanks!
Joe

	[[alternative HTML version deleted]]