[R] ols function in rms package

Tue Jun 8 12:29:55 CEST 2010

> On 06/06/2010 10:49 PM, Mark Seeto wrote:
>> Hello,
>>
>> I have a couple of questions about the ols function in Frank Harrell's
>> rms
>> package.
>>
>> Is there any way to specify variables by their column number in the data
>> frame rather than by the variable name?
>>
>> For example,
>>
>> library(rms)
>> x1<- rnorm(100, 0, 1)
>> x2<- rnorm(100, 0, 1)
>> x3<- rnorm(100, 0, 1)
>> y<- x2 + x3 + rnorm(100, 0, 5)
>> d<- data.frame(x1, x2, x3, y)
>> rm(x1, x2, x3, y)
>> lm(y ~ d[,2] + d[,3], data = d)  # This works
>> ols(y ~ d[,2] + d[,3], data = d) # Gives error
>> Error in if (!length(fname) || !any(fname == zname)) { :
>>    missing value where TRUE/FALSE needed
>>
>> However, this works:
>> ols(y ~ x2 + d[,3], data = d)
>>
>> The reason I want to do this is to program variable selection for
>> bootstrap model validation.
>>
>> A related question: does ols allow "y ~ ." notation?
>>
>> lm(y ~ ., data = d[, 2:4])  # This works
>> ols(y ~ ., data = d[, 2:4]) # Gives error
>> Error in terms.formula(formula) : '.' in formula and no 'data' argument
>>
>> Thanks for any help you can give.
>>
>> Regards,
>> Mark
>
> Hi Mark,
>
> It appears that you answered the questions yourself.  rms wants real
> variables or transformations of them.  It makes certain assumptions
> about names of terms.   The y ~ . should work though; sometime I'll have
> a look at that.
>
> But these are the small questions compared to what you really want.  Why
> do you need variable selection, i.e., what is wrong with having
> insignificant variables in a model?  If you indeed need variable
> selection see if backwards stepdown works for you.  It is built-in to
> rms bootstrap validation and calibration functions.
>
> Frank
>

Thank you for your reply, Frank. I would have reached the conclusion
that rms only accepts real variables had this not worked:
ols(y ~ x2 + d[,3], data = d)

The reason I want to program variable selection is so that I can use the
bootstrap to check the performance of a model-selection method. My
co-workers and I have used a variable selection method which combines
forward selection, backward elimination, and best subsets (the forward and
backward methods were run using different software).

I want to do bootstrap validation to (1) check the over-optimism in R^2,
and (2) justify using a different approach, if R^2 turns out to be very
over-optimistic. The different approach would probably be data reduction
using variable clustering, as you describe in your book.

Regards,
Mark
-- 
Mark Seeto
Statistician

National Acoustic Laboratories <http://www.nal.gov.au/>
A Division of Australian Hearing

126 Greville Street
Chatswood NSW 2067 Australia