[R] Cox regression model for matched data with replacement
Therneau, Terry M., Ph.D.
therneau at mayo.edu
Wed Aug 13 16:19:52 CEST 2014
On 08/13/2014 08:38 AM, John Pura wrote:
> Thank you for the reply. However, I think I may not have clarified what my cases are. I'm studying the effect of radiation treatment (vs. none) on survival. My cases are patients who received radiation and controls are those who did not. I used a propensity score model to match cases to controls in a 1:2 fashion. However, because the matching was done with replacement, some controls were matched to more than one case. How can I go about analyzing this - would frequency weighting work?
We went down the wrong path. When people use the word "case" it almost always refers to
"subjects who had the outcome". If I read the above correctly you have the more simple
situation of subset selection. Subjects were chosen to be in the model without reference
to their outcome status, with the goal of balancing treatment wrt other predictive
factors. Correct? If so, my preferred modeling strategy, in order.
1. coxph(Surv(time, status) ~ treatment, data=one)
Where data set "one" has one copy of each subject selected to be in the study. If they
were nominated twice they still appear once. Optional: give each control a case weight
equal to the number of times they were selected. This will better balance the data set
wrt the factors.
2. Same model, with covariates. The argument about whether covariates on which you have
balanced should be included in the model is as old the hills --- "belt AND suspenders?"
--- with proponents on both sides. Meh. Unless there are too many of course. I still
like the 10-20 events per covarate rule to choose the maximum number of predictors.
3. coxph(Surv(time, status) ~ treatment + strata(group), data=two)
I veiw this as model 2 with paranoia. "The covariate effects are so odd that we'll
never model them correctly, so treat each combination as unique." The data set two needs
to have each treated subject + their controls in a separate stratum. Even though some
controls are in the data set twice, they don't need case weights since they are in any
given stratum only once.
For any of the above you can add a robust variance. Required if case weights are used.
More information about the R-help