[R] proportional weights

Thu Feb 6 14:57:42 CET 2014

Dear John,

thanks for the clarification! The lesson to be learned is that one 
should be aware of the fact that weights may mean different things in 
different functions, and sometimes different things in the same function 
(glm)!

Göran

On 02/06/2014 02:17 PM, John Fox wrote:
> Dear Marco and Goran,
>
> Perhaps the documentation could be clearer, but it is after all a
> brief help page. Using weights of 2 to lm() is *not* equivalent to
> entering the observation twice. The weights are variance weights, not
> case weights.
>
> You can see this by looking at the whole summary() output for the
> models, not just the residual standard errors:
>
> ------------- snip ---------
>
>> summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))
>
> Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
> weights = rep(1, 6))
>
> Residuals: 1        2        3        4        5        6 0.06477
> -0.08728  0.07487 -0.03996  0.01746 -0.02986
>
> Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
> -0.11208    0.08066   -1.39    0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
> 1.04731    0.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
> 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.07108 on 4 degrees of freedom Multiple
> R-squared:  0.9949,	Adjusted R-squared:  0.9937 F-statistic: 787.6 on
> 1 and 4 DF,  p-value: 9.59e-06
>
>> summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))
>
> Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
> weights = rep(2, 6))
>
> Residuals: 1        2        3        4        5        6 0.09160
> -0.12343  0.10589 -0.05652  0.02469 -0.04223
>
> Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
> -0.11208    0.08066   -1.39    0.237 c(1, 2.1, 2.9, 1.1, 2, 3)
> 1.04731    0.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
> 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1005 on 4 degrees of freedom Multiple
> R-squared:  0.9949,	Adjusted R-squared:  0.9937 F-statistic: 787.6 on
> 1 and 4 DF,  p-value: 9.59e-06
>
> ------------- snip -------------
>
> Notice that while the residual standard errors differ, the
> coefficients and their standard errors are identical. There are
> compensating changes in the residual variance and the weighted sum of
> squares and products matrix for X.
>
> In contrast, literally entering each observation twice reduces the
> coefficient standard errors by a factor of sqrt((6 - 2)/(12 - 2)),
> i.e., the square root of the relative residual df of the models:
>
> ------------- snip --------
>
>> summary(lm(rep(c(1,2,3,1,2,3),2)~rep(c(1,2.1,2.9,1.1,2,3),2)))
>
> Call: lm(formula = rep(c(1, 2, 3, 1, 2, 3), 2) ~ rep(c(1, 2.1, 2.9,
> 1.1, 2, 3), 2))
>
> Residuals: Min        1Q    Median        3Q       Max -0.087276
> -0.039963 -0.006201  0.064768  0.074874
>
> Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept)
> -0.11208    0.05101  -2.197   0.0527 rep(c(1, 2.1, 2.9, 1.1, 2, 3),
> 2)  1.04731    0.02360  44.374 8.12e-13
>
> (Intercept)                       . rep(c(1, 2.1, 2.9, 1.1, 2, 3), 2)
> *** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
> 1
>
> Residual standard error: 0.06358 on 10 degrees of freedom Multiple
> R-squared:  0.9949,	Adjusted R-squared:  0.9944 F-statistic:  1969 on
> 1 and 10 DF,  p-value: 8.122e-13
>
> ---------- snip -------------
>
> I hope this helps,
>
> John
>
> ------------------------------------------------ John Fox, Professor
> McMaster University Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/   On Thu, 6 Feb 2014 09:27:22 +0100
> Göran Broström <goran.brostrom at umu.se> wrote:
>> On 05/02/14 22:40, Marco Inacio wrote:
>>> Hello all, can help clarify something?
>>>
>>> According to R's lm() doc:
>>>
>>>> Non-NULL weights can be used to indicate that different
>>>> observations have different variances (with the values in
>>>> weights being inversely *proportional* to the variances); or
>>>> equivalently, when the elements of weights are positive
>>>> integers w_i, that each response y_i is the mean of w_i
>>>> unit-weight observations (including the case that there are w_i
>>>> observations equal to y_i and the data have been summarized).
>>>
>>> Since the idea here is *proportion*, not equality, shouldn't the
>>> vectors of weights x, 2*x give the same result? And yet they
>>> don't, standard errors differs:
>>>
>>>>> summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))$sigma
>>>>
>>>>>
[1] 0.07108323
>>>>> summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))$sigma
>>>>
>>>>>
[1] 0.1005269
>>
>> The weights are in fact case weights, i.e., a weight of 2 is the
>> same as including the corresponding item twice. I agree that the
>> documentation is no wonder of clarity in this respect.
>>
>> Btw, note that, in your example, (0.1005269 / 0.07108323)^2 = 2,
>> your constant weight.
>>
>> Göran Broström
>>>
>>>
>>> So what if I know a-priori, observation A has variance 2 times
>>> bigger than observation B? Both weights=c(1,2) and weights=c(2,4)
>>> (and so on) represent very well this knowledge, but we get
>>> different regression (since sigma is different).
>>>
>>>
>>> Also, if we do the same thing with a glm() model, than we get a
>>> lot of other differences like in the deviance.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
>>> posting guide http://www.R-project.org/posting-guide.html and
>>> provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________ R-help at r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>