[R] ols function in rms package

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Tue Jun 8 14:23:12 CEST 2010


On 06/08/2010 05:29 AM, Mark Seeto wrote:
>
>> On 06/06/2010 10:49 PM, Mark Seeto wrote:
>>> Hello,
>>>
>>> I have a couple of questions about the ols function in Frank Harrell's
>>> rms
>>> package.
>>>
>>> Is there any way to specify variables by their column number in the data
>>> frame rather than by the variable name?
>>>
>>> For example,
>>>
>>> library(rms)
>>> x1<- rnorm(100, 0, 1)
>>> x2<- rnorm(100, 0, 1)
>>> x3<- rnorm(100, 0, 1)
>>> y<- x2 + x3 + rnorm(100, 0, 5)
>>> d<- data.frame(x1, x2, x3, y)
>>> rm(x1, x2, x3, y)
>>> lm(y ~ d[,2] + d[,3], data = d)  # This works
>>> ols(y ~ d[,2] + d[,3], data = d) # Gives error
>>> Error in if (!length(fname) || !any(fname == zname)) { :
>>>     missing value where TRUE/FALSE needed
>>>
>>> However, this works:
>>> ols(y ~ x2 + d[,3], data = d)
>>>
>>> The reason I want to do this is to program variable selection for
>>> bootstrap model validation.
>>>
>>> A related question: does ols allow "y ~ ." notation?
>>>
>>> lm(y ~ ., data = d[, 2:4])  # This works
>>> ols(y ~ ., data = d[, 2:4]) # Gives error
>>> Error in terms.formula(formula) : '.' in formula and no 'data' argument
>>>
>>> Thanks for any help you can give.
>>>
>>> Regards,
>>> Mark
>>
>> Hi Mark,
>>
>> It appears that you answered the questions yourself.  rms wants real
>> variables or transformations of them.  It makes certain assumptions
>> about names of terms.   The y ~ . should work though; sometime I'll have
>> a look at that.
>>
>> But these are the small questions compared to what you really want.  Why
>> do you need variable selection, i.e., what is wrong with having
>> insignificant variables in a model?  If you indeed need variable
>> selection see if backwards stepdown works for you.  It is built-in to
>> rms bootstrap validation and calibration functions.
>>
>> Frank
>>
>
> Thank you for your reply, Frank. I would have reached the conclusion
> that rms only accepts real variables had this not worked:
> ols(y ~ x2 + d[,3], data = d)

Hi Mark - that probably worked by accident.

>
> The reason I want to program variable selection is so that I can use the
> bootstrap to check the performance of a model-selection method. My
> co-workers and I have used a variable selection method which combines
> forward selection, backward elimination, and best subsets (the forward and
> backward methods were run using different software).
>
> I want to do bootstrap validation to (1) check the over-optimism in R^2,
> and (2) justify using a different approach, if R^2 turns out to be very
> over-optimistic. The different approach would probably be data reduction
> using variable clustering, as you describe in your book.

The validate.ols function which calls the predab.resample function may 
give you some code to start with.  Note however that the performance of 
the approach you are suggestion has already been shown to be poor in 
many cases.  You might run the following in parallel: full model fits 
and penalized least squares using penalties selected by AIC (using 
special arguments to ols along with the pentrace function).

Frank

>
> Regards,
> Mark


-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list