[R] naive "collinear" weighted linear regression

Sat Nov 14 22:51:42 CET 2009

On Nov 14, 2009, at 1:50 PM, Mauricio O Calvao wrote:

> David Winsemius <dwinsemius <at> comcast.net> writes:
>
>
>>
>> Which means those x, y, and "error" figures did not come from an
>> experiment, but rather from theory???
>>
>
> The fact is I am trying to compare the results of:
> (1) lm under R and
> (2) the Java applet at http://omnis.if.ufrj.br/~carlos/applets/reta/reta.html
> (3) the Fit method of the ROOT system used by CERN,
> (4) the Numerical Recipes functions for weighted linear regression
>
> The three last ones all provide, for the "fake" data set I furnished  
> in my first
> post in this thread, the same results; particularly they give erros or
> uncertainties in the estimated coefficients of intercept and slope  
> which, as
> seems intuitive, are not zero at all, but of the order 0.1 or 0.2,  
> whereas the
> method lm under R issues a "Std. Error", which is zero.  
> Independently of
> terminology, which sure is of utmost importance, the data I provided  
> should give
> rise to a best fit straight line with intercept zero and slope 2,  
> but also with
> non-vanishing errors associated with them. How do I get this from  
> lm????
>
> I only want, for instance, calculation of the so-called covariance  
> matrix for
> the estimated coefficients, as given, for instance, in Equation  
> (2.3.2) of the
> second edition of Draper and Smith, "Applied regression analysis";  
> this is a
> standard statistical result, right? So why does R not directly  
> provide it as a
> summary from an lm object???

It's really not that difficult to get the variance covariance matrix.  
What is not so clear is why you think differential weighting of a set  
that has a perfect fit should give meaningfully different results than  
a fit that has no weights.

?lm
?vcov

 >  y <- c(2,4,6,8) # response vect
 > fit_mod <- lm(y~x,weights=1/error^2)
Error in eval(expr, envir, enclos) : object 'error' not found
 > error <- c(0.3,0.3,0.3,0.3)
 > fit_mod <- lm(y~x,weights=1/error^2)
 > vcov(fit_mod)
               (Intercept)             x
(Intercept)  2.396165e-30 -7.987217e-31
x           -7.987217e-31  3.194887e-31

Numerically those are effectively zero.

 > fit_mod <- lm(y~x)
 > vcov(fit_mod)
             (Intercept) x
(Intercept)           0 0
x                     0 0

-- 
David.

>
>
>>>
>>> Of course the best fit coefficients should be 0 for the intercept
>>> and 2 for the slope. Furthermore, it seems completely plausible (or
>>> not?) that, since the y_i have associated non-vanishing
>>> ``errors'' (dispersions), there should be corresponding non-
>>> vanishing ``errors'' associated to the best fit coefficients, right?
>>>
>>> When I try:
>>>
>>>> fit_mod <- lm(y~x,weights=1/error^2)
>>>
>>> I get
>>>
>>> Warning message:
>>> In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>>> extra arguments weigths are just disregarded.
>>
>>  (Actually the weights are for adjusting for sampling, and I do not
>> see any sampling in your "design".)
>>
>>>
>>> Keeping on, despite the warning message, which I did not quite
>>> understand, when I type:
>>>
>>>> summary(fit_mod)
>>>
>>> I get
>>>
>>> Call:
>>> lm(formula = y ~ x, weigths = 1/error^2)
>>>
>>> Residuals:
>>>        1          2          3          4
>>> -5.067e-17  8.445e-17 -1.689e-17 -1.689e-17
>>>
>>> Coefficients:
>>>            Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 0.000e+00  8.776e-17 0.000e+00        1
>>> x           2.000e+00  3.205e-17 6.241e+16   <2e-16 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>> Residual standard error: 7.166e-17 on 2 degrees of freedom
>>> Multiple R-squared:     1,      Adjusted R-squared:     1
>>> F-statistic: 3.895e+33 on 1 and 2 DF,  p-value: < 2.2e-16
>>>
>>>
>>> Naively, should not the column Std. Error be different from zero??
>>> What I have in mind, and sure is not what Std. Error means, is that
>>> if I carried out a large simulation, assuming each response y_i a
>>> Gaussian random variable with mean y_i and standard deviation
>>> 2*error=0.6, and then making an ordinary least squares fitting of
>>> the slope and intercept, I would end up with a mean for these
>>> simulated coefficients which should be 2 and 0, respectively,
>>
>> Well, not precisely 2 and 0, but rather something very close ... i.e,
>> within "experimental error". Please note that numbers in the range of
>> 10e-17 are effectively zero from a numerical analysis perspective.
>>
>>
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
>>
>>> .Machine$double.eps ^ 0.5
>> [1] 1.490116e-08
>
> I know this all too well and it is obviously a trivial supernewbie  
> issue, which
> I have already overcome a long time ago...
>
>>
>>> and, that's the point, a non-vanishing standard deviation for these
>>> fitted coefficients, right?? This somehow is what I expected should
>>> be an estimate or, at least, a good indicator, of the degree of
>>> uncertainty which I should assign to the fitted coefficients; it
>>> seems to me these deviations, thus calculated as a result of the
>>> simulation, will certainly not be zero (or  3e-17, for that matter).
>>> So this Std. Error does not provide what I, naively, think should be
>>> given as a measure of the uncertainties or errors in the fitted
>>> coefficients...
>>
>> You are trying to impose an error structure on a data situation that
>> you constructed artificially to be perfect.
>>
>>>
>>> What am I not getting right??
>>
>> That if you input "perfection" into R's linear regression program,  
>> you
>> get appropriate warnings?
>>
>>>
>>> Thanks and sorry for the naive and non-expert question!
>>
>> You are a Professor of physics, right? You do experiments, right? You
>> replicate them.  S0 perhaps I'm the one who should be puzzled.
>
> Unfortunately you eschewed answering objectively any of my  
> questions; I insist
> they do make sense. Don't mention the data are perfect; this does  
> not help to
> make any progress in understanding the choice of convenient summary  
> info the lm
> method provides, as compared to what, in my humble opinion and in  
> this specific
> particular case, it should provide: the covariance matrix of the  
> estimated
> coefficients...
>
>>
>>> -- 
>>> #######################################
>>> Prof. Mauricio Ortiz Calvao
>>> Federal University of Rio de Janeiro
>>> Institute of Physics, P O Box 68528
>>> CEP 21941-972 Rio de Janeiro, RJ
>>> Brazil
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT