[R] An R vs. SAS Discrepancy: How do I determine which is correct?

Kevin E. Thorpe kevin.thorpe at utoronto.ca
Tue Dec 1 20:44:52 CET 2009


I was messing around with some data in R and SAS (the reason is
unimportant) fitting a multiple linear regression and got a
curious discrepancy.  The data set is too big to post, but if
someone wants it, I can send it.

So, here are the (partial) results:

 From R:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.11434    1.48065  41.275  < 2e-16 ***
sexWomen     2.91108    0.35753   8.142    5e-16 ***
diabp        0.20675    0.01504  13.746  < 2e-16 ***
age         -0.08085    0.02088  -3.871 0.000110 ***

 From SAS (sorry about word-wrap if it happens):

                               Parameter Estimates

                                                 Parameter     Standard
  Variable   Label                         DF     Estimate        Error 
  t Value

  Intercept  Intercept                      1     58.20326      1.57802 
    36.88
  SEX        SEX                            1      2.91108      0.35753 
     8.14
  DIABP      Diastolic BP mmHg              1      0.20675      0.01504 
    13.75
  AGE        Age (years) at examination     1     -0.08085      0.02088 
    -3.87

                               Parameter Estimates

              Variable   Label                         DF  Pr > |t|

              Intercept  Intercept                      1    <.0001
              SEX        SEX                            1    <.0001
              DIABP      Diastolic BP mmHg              1    <.0001
              AGE        Age (years) at examination     1    0.0001

The curious thihs is that all parameter estimates agree except the
intercept.  In R I also computed the coefficients directly using
(X'X)^(-1) X' y and get the same coefficients as lm() have me.
Also, ols() in Design agrees with lm()

As far as I can tell, the data used in R and SAS are identical.  So,
whose answer is correct and how do I prove it?  Here's my sessionInfo
(yes, I know my version of R is oldish).

 > sessionInfo()
R version 2.8.0 (2008-10-20)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] Design_2.2-0    survival_2.35-4 Hmisc_3.6-0     lattice_0.17-25

loaded via a namespace (and not attached):
[1] cluster_1.12.0 grid_2.8.0

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016




More information about the R-help mailing list