[R] different regression coeffs with different starting point

Achim Zeileis Achim.Zeileis at uibk.ac.at
Mon Mar 14 19:59:57 CET 2011


On Mon, 14 Mar 2011, Jen wrote:

> Hi Bill,
> Thanks for your response and I'm sorry -- that was a misleading example of
> what I was trying to show. This one should illustrate the point:
>
> require(AER)
> data_in = c(0,6,12,18,24,30,36,42,48,54,60,66,72,78)
> data_in2 = data_in^2
> data_in3 = data_in^3
> data_out =
> c(139487.00,133333.00,62500.00,58823.00,56338.00,27549.00,4244.00,600.00,112.00,95.00,48.00,31.00,15.00,14.99)
> ldata_out = log(data_out)
>
> m1 <- lm(ldata_out ~ data_in + data_in2+data_in3)
> cubic <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m1))

The scaling of your regressors is extremely bad. data_in3 is in the 
hundreds of thousands. This computationally challenging, to put it mildly. 
I would recommend to either scale data_in by some constant (here 10 or 100 
is enough) or to use an orthogonal coding of the polynomial. Both can be 
achieved easily using the poly() function:
   - Your model would be:    ldata_out ~ poly(data_in, 3, raw = TRUE)
   - With scaled regressors: ldata_out ~ poly(data_in/100, 3, raw = TRUE)
   - With orthogonal coding: ldata_out ~ poly(data_in/100, 3)

All models conceptually lead to the same fitted values and the same 
maximal likelihood. Numerically, however, the first model only achieves 
suboptimal values depending on the starting values. The latter two models 
seem to yield more sensible results and be close to the actual maximum.

Note that the coefficients change of course due to the different codings 
of the regressors. This needs to be taken into account if you want to 
compare the slopes.

In summary, I would probably use

cubicz <- tobit(ldata_out ~ poly(data_in/100, 3, raw = TRUE),
   left = log(15-0.01))

which is very robust to the choice of starting values.

hth,
Z

> print(cubic)
> fitted(cubic)
>
> n= length(data_in)
> m2 <- lm(ldata_out[1:(n-1)] ~ data_in[1:(n-1)] +
> data_in2[1:(n-1)]+data_in3[1:(n-1)])
> cubic2 <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m2))
> print(cubic2)
> fitted(cubic2)
>
> The models cubic and cubic2 seem to show that the intercept value in the
> model doesn't change from its starting value:
>
>> m1
>
> Call:
> lm(formula = ldata_out ~ data_in + data_in2 + data_in3)
>
> Coefficients:
> (Intercept)      data_in     data_in2     data_in3
>  1.156e+01    1.004e-01   -7.755e-03    6.464e-05
>
>> cubic
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
>    0.01), init = coef(m1))
>
> Coefficients:
> (Intercept)      data_in     data_in2     data_in3
> 11.5559540    0.0289083   -0.0040615    0.0000229
>
> Scale: 3.759
>
>> m2
>
> Call:
> lm(formula = ldata_out[1:(n - 1)] ~ data_in[1:(n - 1)] + data_in2[1:(n -
>    1)] + data_in3[1:(n - 1)])
>
> Coefficients:
>        (Intercept)   data_in[1:(n - 1)]  data_in2[1:(n - 1)]  data_in3[1:(n
> - 1)]
>          1.150e+01            1.151e-01           -8.381e-03
> 7.122e-05
>
>> cubic2
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
>    0.01), init = coef(m2))
>
> Coefficients:
> (Intercept)      data_in     data_in2     data_in3
>  1.150e+01    3.391e-02   -4.187e-03    2.383e-05
>
> Scale: 3.759
>
> Am I missing something here??
>
> --
> View this message in context: http://r.789695.n4.nabble.com/different-regression-coeffs-with-different-starting-point-tp3353536p3354276.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list