[R] newbie problem using Design.rcs

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Dec 23 22:57:12 CET 2008

sp wrote:
> Sincere thanks for both the replies.
> 0. I agree, I'm waiting for my copy of a regression book to arrive. Meanwhile, I'm trying to read on google.
> 1. My bad, I'm using Gaussian noise.
> 2. I didn't have x^3 b/c that co-efficient happens to be zero in this fitting.

That's strange.

> 3. I used lines() b/c I wanted to superimpose the curve from regression atop my first plot of the original data points (x,y). 
> I'm not sure how to use plot(f, x1 = NA) after my first plot(). The examples I managed to find on google all use plot() followed by lines(). [In Matlab, I'd just say "hold" in between these calls.]

plot(f, x1=NA)
plot(f, x2=NA, add=TRUE)

> Also, I'm forced to call win.graph() before my first plot() to see the first plot. Is that normal?


> 4. I really could use some guidance on this part. I need to use rcs() to fit points in a high-dimensional space and I'm trying to understand and use it correctly. 

keep reading

> I started with testing it on just x,y dimensions so that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y). 
> I plot original x,y and x,y' where y' is calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3....

wait until you have studied regression


> If this is NOT a good way to test fitting, could you pls tell me a better way?
> Respectfully,
> sp
> --- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
>> From: Frank E Harrell Jr <f.harrell at vanderbilt.edu>
>> Subject: Re: [R] newbie problem using Design.rcs
>> To: "David Winsemius" <dwinsemius at comcast.net>
>> Cc: to_rent_2000 at yahoo.com, r-help at r-project.org
>> Date: Tuesday, December 23, 2008, 9:41 AM
>> In addition to David's excellent response, I'll add
>> that your problems seem to be statistical and not
>> programming ones.  I recommend that you spend a significant
>> amount of time with a good regression text or course before
>> using the software.  Also, with Design you can find out the
>> algebraic form of the fit:
>> f <- ols(y ~ rcs(x,3), data=mydata)
>> Function(f)
>> Frank
>> David Winsemius wrote:
>>> On Dec 22, 2008, at 11:38 PM, sp wrote:
>>>> Hi,
>>>> I read data from a file. I'm trying to
>> understand how to use Design.rcs by using simple test data
>> first. I use 1000 integer values (1,...,1000) for x (the
>> predictor) with some noise (x+.02*x) and I set the response
>> variable y=x. Then, I try rcs and ols as follows:
>>> Not sure what sort of noise that is.
>>>> m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without
>> sqrt also
>>>> f = ols(m, data=data_train.df);
>>>> print(f);
>>>> [I plot original x1,y1 vectors and the regression
>> as in
>>>> y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
>>> That does not look as though it would capture the
>> structure of a restricted **cubic** spline. The usual method
>> in Design for plotting a model prediction would be:
>>> plot(f, x1 = NA)
>>>> But this gives me a VERY bad fit:
>>>> "
>>> Can you give some hint why you consider this to be a
>> "VERY bad fit"? It appears a rather good fit to
>> me, despite the test case apparently not being construct
>> with any curvature which is what the rcs modeling strategy
>> should be detecting.
>> -- Frank E Harrell Jr   Professor and Chair          
>> School of Medicine
>>                      Department of Biostatistics  
>> Vanderbilt University

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list