[R] Results from clogit out of range?

Terry Therneau therneau at mayo.edu
Mon Mar 4 15:04:29 CET 2013


I'm late to this discussion, but let me try to put it in another context.
   Assume that I wanted to know whether kids who live west of their school or east of 
their shool are more likely to be early (some hypothesis about walking slower if the sun 
is in their eyes).  So I create a 0/1 variable east/west and get samples of 10 student 
arrival times at each of 100 different schools.  Fit the model

    lm(arrive ~ factor(school) + east.west)

where "arrive" is in some common scale like "minutes since midnight".  Since different 
schools could have different starting times for their first class we need an intercept per 
school.

   Two questions:
      1. Incremental effect: the coefficient of east/west measures the incredmental effect 
across all schools.  With n of 1000 it is likely estimated with high precision.
      2. Absolute: predict the average arrival time (on the clock) for students.

Conditional logistic is very like this.  We have a large number of strata ("schools") with 
a small number of observations in each (often only 2 per strata).  One can ask incremental 
questions about variables common to all strata, but absolute prediction is pretty 
worthless.  a. You can only do it for schools (strata) that have already been seen and b. 
there are so few subjects in each of them that the estimates are very noisy.
   The default prediction from clogit is focused on questions of type 1.  The 
documentation doesn't even bother to mention predictions of type 2, which would be 
probabilities of events.  I can think of a way to extract such output from the routine 
(being the author gives some insight), but why would I want to?

Terry Therneau



More information about the R-help mailing list