[R] polr probit versus stata oprobit

Thomas Lumley tlumley at u.washington.edu
Thu Nov 11 02:15:26 CET 2004


On Wed, 10 Nov 2004, Jean Eid wrote:

> Dear All,
> I have been struggling to understand why for the housing data in MASS
> library R and stata give coef. estimates that are really different. I also
> tried to come up with many many examples myself (see below, of course I
> did not have the set.seed command included) and all of my
> `random' examples seem to give verry similar output. For the housing data,
> I have changed the data into numeric vectors instead of factors/ordered
> factors. I did so to try and get the same results as in STATA and to have
> the housing example as close as possible to the one I constructed.
>
> I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS
> version  7.2-8.
>
>
> here's the example ( I assume that you have STATA installed and can run in
> batch mode, if not the output is also given below)
>

That example shows the same results with Stata and polr() from MASS.

For the housing data, I also get the same coefficients in Stata as with 
polr():

In R:
library(MASS)
library(foreign)
write.dta(housing, file="housing.dta")
house.probit<-polr(Sat ~ Infl + Type + Cont, data = housing, weights = 
Freq, method = "probit")
summary(house.probit)
-------------------------
Re-fitting to get Hessian

Call:
polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq,
     method = "probit")

Coefficients:
                    Value Std. Error   t value
InflMedium     0.3464233 0.06413706  5.401297
InflHigh       0.7829149 0.07642620 10.244063
TypeApartment -0.3475372 0.07229093 -4.807480
TypeAtrium    -0.2178874 0.09476607 -2.299213
TypeTerrace   -0.6641737 0.09180004 -7.235005
ContHigh       0.2223862 0.05812267  3.826153

Intercepts:
             Value   Std. Error t value
Low|Medium  -0.2998  0.0762    -3.9371
Medium|High  0.4267  0.0764     5.5850

Residual Deviance: 3479.689
AIC: 3495.689
------------------------


In Stata
-----------------
. use housing.dta
. xi: oprobit Sat i.Infl i.Type i.Cont [fw=Freq]
i.Infl            _IInfl_1-3          (naturally coded; _IInfl_1 omitted)
i.Type            _IType_1-4          (naturally coded; _IType_1 omitted)
i.Cont            _ICont_1-2          (naturally coded; _ICont_1 omitted)

Iteration 0:   log likelihood = -1824.4388
Iteration 1:   log likelihood = -1739.9254
Iteration 2:   log likelihood = -1739.8444

Ordered probit estimates                          Number of obs   =    1681
                                                   LR chi2(6)      =   169.19
                                                   Prob > chi2     =   0.0000
Log likelihood = -1739.8444                       Pseudo R2       =   0.0464

------------------------------------------------------------------------------
          Sat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. 
Interval]
-------------+----------------------------------------------------------------
     _IInfl_2 |   .3464228    .064137     5.40   0.000     .2207165     .472129
     _IInfl_3 |   .7829146    .076426    10.24   0.000     .6331224    .9327069
     _IType_2 |  -.3475367   .0722908    -4.81   0.000    -.4892241   -.2058493
     _IType_3 |  -.2178875    .094766    -2.30   0.021    -.4036254   -.0321497
     _IType_4 |  -.6641735   .0917999    -7.24   0.000     -.844098    -.484249
     _ICont_2 |   .2223858   .0581226     3.83   0.000     .1084676     .336304
-------------+----------------------------------------------------------------
        _cut1 |  -.2998279   .0761537          (Ancillary parameters)
        _cut2 |   .4267208   .0764043
------------------------------------------------------------------------------



 	-thomas




More information about the R-help mailing list