[R] upperbound of C index Conf.int. greater than 1

Wed May 14 15:05:50 CEST 2008

DAVID ARTETA GARCIA wrote:
> 
> 
> Dear Frank
> 
> Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
> 
>>
>> A few observations.
>>
>> 1. With minimal overfitting, rcorr.cens(predict(fit), Y) gives a good
>> standard error for Dxy = 2*(C-.5) and bootstrapping isn't very necessary
>>
>> 2. If you bootstrap use the nonparametric bootstrap percentile method
>> or  other methods that constrain the confidence interval to be in [0,1].
>>
>> 3. I don't know why the model would be linear on the two predictors you
>> are using.
> 
> Do you mean using these predictors fitted with spline functions?? I have 
> read about it in your "Regression Modeling Strategies" but I am not very 
> sure I understand the use of them. I will read through it again.

Yes, or other types of splines.  In general I don't expect things to be 
linear.  If you have enough data you can always allow for nonlinearity. 
  The book has a strategy for allocating degrees of freedom based on the 
predictive potential for each variable, and the following strategy also 
works:

f <- lrm(y ~ rcs(x1,5) + rcs(x2,5))
plot(anova(f))

That plot masks the contribution of nonlinear terms so you won't be 
biased.  You can reduce d.f. or force linearity for those variables 
having lower overall partial chi-squares.  The plot shows partial Wald 
chi-square minus its degrees of freedom.  The plot does not bias you as 
long as you agree to devote at least one d.f. (linear fit) to each 
predictor.  If you have used y to screen predictors to narrow it down to 
x1 and x2, all bets are off.

Frank

> 
> David
> 
>>
>> Frank
>>
>> -- 
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                      Department of Biostatistics   Vanderbilt University
>