[R] ols function in rms package
Frank E Harrell Jr
f.harrell at Vanderbilt.Edu
Tue Jun 8 14:23:12 CEST 2010
On 06/08/2010 05:29 AM, Mark Seeto wrote:
>
>> On 06/06/2010 10:49 PM, Mark Seeto wrote:
>>> Hello,
>>>
>>> I have a couple of questions about the ols function in Frank Harrell's
>>> rms
>>> package.
>>>
>>> Is there any way to specify variables by their column number in the data
>>> frame rather than by the variable name?
>>>
>>> For example,
>>>
>>> library(rms)
>>> x1<- rnorm(100, 0, 1)
>>> x2<- rnorm(100, 0, 1)
>>> x3<- rnorm(100, 0, 1)
>>> y<- x2 + x3 + rnorm(100, 0, 5)
>>> d<- data.frame(x1, x2, x3, y)
>>> rm(x1, x2, x3, y)
>>> lm(y ~ d[,2] + d[,3], data = d) # This works
>>> ols(y ~ d[,2] + d[,3], data = d) # Gives error
>>> Error in if (!length(fname) || !any(fname == zname)) { :
>>> missing value where TRUE/FALSE needed
>>>
>>> However, this works:
>>> ols(y ~ x2 + d[,3], data = d)
>>>
>>> The reason I want to do this is to program variable selection for
>>> bootstrap model validation.
>>>
>>> A related question: does ols allow "y ~ ." notation?
>>>
>>> lm(y ~ ., data = d[, 2:4]) # This works
>>> ols(y ~ ., data = d[, 2:4]) # Gives error
>>> Error in terms.formula(formula) : '.' in formula and no 'data' argument
>>>
>>> Thanks for any help you can give.
>>>
>>> Regards,
>>> Mark
>>
>> Hi Mark,
>>
>> It appears that you answered the questions yourself. rms wants real
>> variables or transformations of them. It makes certain assumptions
>> about names of terms. The y ~ . should work though; sometime I'll have
>> a look at that.
>>
>> But these are the small questions compared to what you really want. Why
>> do you need variable selection, i.e., what is wrong with having
>> insignificant variables in a model? If you indeed need variable
>> selection see if backwards stepdown works for you. It is built-in to
>> rms bootstrap validation and calibration functions.
>>
>> Frank
>>
>
> Thank you for your reply, Frank. I would have reached the conclusion
> that rms only accepts real variables had this not worked:
> ols(y ~ x2 + d[,3], data = d)
Hi Mark - that probably worked by accident.
>
> The reason I want to program variable selection is so that I can use the
> bootstrap to check the performance of a model-selection method. My
> co-workers and I have used a variable selection method which combines
> forward selection, backward elimination, and best subsets (the forward and
> backward methods were run using different software).
>
> I want to do bootstrap validation to (1) check the over-optimism in R^2,
> and (2) justify using a different approach, if R^2 turns out to be very
> over-optimistic. The different approach would probably be data reduction
> using variable clustering, as you describe in your book.
The validate.ols function which calls the predab.resample function may
give you some code to start with. Note however that the performance of
the approach you are suggestion has already been shown to be poor in
many cases. You might run the following in parallel: full model fits
and penalized least squares using penalties selected by AIC (using
special arguments to ols along with the pentrace function).
Frank
>
> Regards,
> Mark
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list