[R] Great difference for piecewise linear function between R and SAS

Duncan Murdoch murdoch at stats.uwo.ca
Mon Mar 24 14:11:16 CET 2008


On 3/24/2008 9:03 AM, zhijie zhang wrote:
> Dear  Murdoch,
>   Thanks very much for your rapid response. It helps me greatly.
>   If i want to write the model formula according to the estimets from 
> R,  Is it correct for the below formula? I'm not very sure about it, and 
> i also hope that you can recommend a good book on this area. I want to 
> learn it systematically. Thanks a million.
> Logit/p/=12.10+5.815*x+6.654*elevation-6.755*elevation^2
>                       -1.291*distance_1 -10.348*distance2-3.53*distance3
>           _ -6.828*_y1 _ -4.426*_y2 _ -11.216*_y3 _

I would use predict() instead.  What you have there doesn't look as
though it uses the B-spline basis.

The reference given in the ?bs help page is a reasonable starting point,
but just about any book that covers splines should handle the B-spline
basis and the linear case.

Duncan Murdoch

>  #R code and the estimate results--Knots for distance are 16.13 and 24, 
> respectively, and Knots for y are -0.4357 and -0.3202
> m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13,24))
>                 
> +bs(y,degree=1,knots=c(-0.4357,-0.3202)),family=binomial(logit),data=point)
> summary(m.glm)
> 
>                                                       Estimate Std. 
> Error z value Pr(>|z|)
> (Intercept)                                             12.104      
> 3.138   3.857 0.000115 ***
> x                                                       5.815      
> 1.987   2.926 0.003433 **
> poly(elevation, 2)1                                     6.654      
> 4.457   1.493 0.135444
> poly(elevation, 2)2                                     -6.755      
> 3.441  -1.963 0.049645 *
> bs(distance, degree = 1, knots = c(16.13, 24))1         -1.291      
> 1.139  -1.133 0.257220
> bs(distance, degree = 1, knots = c(16.13, 24))2        -10.348      
> 2.025  -5.110 3.22e-07 ***
> bs(distance, degree = 1, knots = c(16.13, 24))3        -3.530      
> 3.752  -0.941 0.346850
> bs(y, degree = 1, knots = c(-0.4357, -0.3202))1        -6.828      
> 1.836  -3.719 0.000200 ***
> bs(y, degree = 1, knots = c(-0.4357, -0.3202))2        -4.426      
> 1.614  -2.742 0.006105 **
> bs(y, degree = 1, knots = c(-0.4357, -0.3202))3        -11.216      
> 2.861  -3.920 8.86e-05 ***
> 
>   Thanks again.
> 
> 
> 
> On Mon, Mar 24, 2008 at 8:08 PM, Duncan Murdoch <murdoch at stats.uwo.ca 
> <mailto:murdoch at stats.uwo.ca>> wrote:
> 
>     On 24/03/2008 7:06 AM, zhijie zhang wrote:
>      > Dear Murdoch,
>      >    "Compare the predictions, not the coefficients.", this is the
>     key point.
>      >    I have checked their predicted probability and their results
>     are the
>      > same.
>      >   What do you mean by  "You're using different bases"? Could you
>     please give
>      > me a little more info on it, so that i can go to find some materials?
> 
>     There are many ways to represent a piecewise linear function.  R and
>     your SAS code have both chosen linear combinations of basis functions,
>     but you have used different basis functions.  R uses the B-spline basis.
>     You have what is sometimes called the truncated power basis, but maybe
>     not commonly in the linear context.  I don't know if it has a name here.
> 
>     Duncan Murdoch
> 
>      >   Thanks a lot.
>      >
>      >
>      > On 3/24/08, Duncan Murdoch <murdoch at stats.uwo.ca
>     <mailto:murdoch at stats.uwo.ca>> wrote:
>      >> On 24/03/2008 5:23 AM, zhijie zhang wrote:
>      >>> Dear Rusers,
>      >>>   I am now using  R and SAS to fit the piecewise linear
>     functions, and
>      >> what
>      >>> surprised me is that they have a great differrent result. See
>     below.
>      >> You're using different bases.  Compare the predictions, not the
>      >> coefficients.
>      >>
>      >> To see the bases that R uses, do this:
>      >>
>      >> matplot(distance, bs(distance,degree=1,knots=c(16.13,24)))
>      >> matplot(y, bs(y,degree=1,knots=c(-0.4357,-0.3202)))
>      >>
>      >> Duncan Murdoch
>      >>
>      >>> #R code--Knots for distance are 16.13 and 24, respectively, and
>     Knots
>      >> for y
>      >>> are -0.4357 and -0.3202
>      >>>
>     m.glm<-glm(mark~x+poly(elevation,2)+bs(distance,degree=1,knots=c(16.13
>      >> ,24))
>      >>>                 +bs(y,degree=1,knots=c(-0.4357,-0.3202
>      >>> )),family=binomial(logit),data=point)
>      >>> summary(m.glm)
>      >>>
>      >>> Coefficients:
>      >>>                                                       Estimate Std.
>      >> Error z
>      >>> value Pr(>|z|)
>      >>> (Intercept)                                             12.104
>      >> 3.138
>      >>> 3.857 0.000115 ***
>      >>> x                                                       5.815  
>        1.987
>      >>> 2.926 0.003433 **
>      >>> poly(elevation, 2)1                                     6.654  
>        4.457
>      >>> 1.493 0.135444
>      >>> poly(elevation, 2)2                                     -6.755
>      >> 3.441  -
>      >>> 1.963 0.049645 *
>      >>> bs(distance, degree = 1, knots = c(16.13, 24))1         -1.291
>      >> 1.139  -
>      >>> 1.133 0.257220
>      >>> bs(distance, degree = 1, knots = c(16.13, 24))2        -10.348
>      >> 2.025  -
>      >>> 5.110 3.22e-07 ***
>      >>> bs(distance, degree = 1, knots = c(16.13, 24))3        -3.530  
>        3.752
>      >>   -
>      >>> 0.941 0.346850
>      >>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))1        -6.828  
>        1.836
>      >>   -
>      >>> 3.719 0.000200 ***
>      >>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))2        -4.426  
>        1.614
>      >>   -
>      >>> 2.742 0.006105 **
>      >>> bs(y, degree = 1, knots = c(-0.4357, -0.3202))3        -11.216
>      >> 2.861  -
>      >>> 3.920 8.86e-05 ***
>      >>>
>      >>> #SAS codes
>      >>> data b;
>      >>>  set a;
>      >>>  if distance > 16.13 then d1=1; else d1=0;
>      >>>  distance2=d1*(distance - 16.13);
>      >>>  if distance > 24 then d2=1; else d2=0;
>      >>>  distance3=d2*(distance - 24);
>      >>>  if y>-0.4357 then dd1=1; else dd1=0;
>      >>>  y2=dd1*(y+0.4357);
>      >>>  if y>-0.3202 then dd2=1; else dd2=0;
>      >>>  y3=dd2*(y+0.3202);
>      >>> run;
>      >>>
>      >>> proc logistic descending data=b;
>      >>>  model mark =x elevation elevation*elevation distance distance2
>      >> distance3 y
>      >>> y2 y3;
>      >>> run;
>      >>>
>      >>>
>      >>>               The LOGISTIC Procedure  Analysis of Maximum
>     Likelihood
>      >>> Estimates
>      >>>
>      >>>                                                      Standard  
>            Wald
>      >>>            Parameter               DF    Estimate       Error
>      >>> Chi-Square    Pr > ChiSq
>      >>>
>      >>>            Intercept                1     -2.6148      2.1445
>      >>> 1.4867
>      >>> 0.2227
>      >>>            x                        1      5.8146      1.9872
>      >>> 8.5615
>      >>> 0.0034
>      >>>            elevation                1      0.4457      0.1506
>      >>> 8.7545
>      >>> 0.0031
>      >>>            elevation*elevation      1     -0.0279      0.0142
>      >>> 3.8533
>      >>> 0.0496
>      >>>            distance                 1     -0.1091      0.0963
>      >>> 1.2836
>      >>> 0.2572
>      >>>            distance2                1     -1.0418      0.2668
>      >>> 15.2424
>      >>> <.0001
>      >>>            distance3                1      2.8633      0.7555
>      >>> 14.3625
>      >>> 0.0002
>      >>>            y                        1    -16.2032      4.3568
>      >>> 13.8314
>      >>> 0.0002
>      >>>            y2                       1     36.9974     10.3219
>      >>> 12.8476
>      >>> 0.0003
>      >>>            y3                       1    -58.4938     14.0279
>      >>> 17.3875
>      >>> <.0001
>      >>> Q: What is the problem? which one is correct for the piecewise
>     linear
>      >>> function?
>      >>>   Thanks very much.
>      >>>
>      >>>
>      >>
>      >
>      >
> 
> 
> 
> 
> -- 
> With Kind Regards,
> 
> oooO:::::::::
> (..):::::::::
> :\.(:::Oooo::
> ::\_)::(..)::
> :::::::)./:::
> ::::::(_/::::
> :::::::::::::
> [***********************************************************************]
> Zhi Jie,Zhang ,PHD
> Tel:+86-21-54237149
> Dept. of Epidemiology,School of Public Health,Fudan University
> Address:No. 138 Yi Xue Yuan Road,Shanghai,China
> Postcode:200032
> Email:epistat at gmail.com <mailto:Email:epistat at gmail.com>
> Website: www.statABC.com <http://www.statABC.com>
> [***********************************************************************]
> oooO:::::::::
> (..):::::::::
> :\.(:::Oooo::
> ::\_)::(..)::
> :::::::)./:::
> ::::::(_/::::
> :::::::::::::



More information about the R-help mailing list