[R] Weighted least squares

S Ellison S.Ellison at lgc.co.uk
Wed May 9 15:34:30 CEST 2007

>>> Adaikalavan Ramasamy <ramasamy at cancer.org.uk> 09/05/2007 01:37:31 >>>
>..the variance of means of each row in table above is ZERO because 
>the individual elements that comprise each row are identical. 
>... Then is it valid then to use lm( y ~ x, weights=freq ) ?

ermmm... probably not, because if that heppened I'd strongly suspect we'd substantially violated some assumptions. 

We are given a number of groups of identical observations. But we are seeking a solution to a problem that posits an underlying variance. If it's not visible within the groups,  where is it? Has it disappeared in numerical precision, or is something else going on?

If we did this regression, we would see identical residuals for all members of a group. That would imply that the variance arises entirely from between-group effects and not at all from within-group effects. To me, that would in turn imply that the number of observations in the group is irrelevant; we should be using use unweighted regression on the group 'means' in this situation if we're using least squares at all. 

If we genuinely have independent observations and by some coincidence they have the same value within available precision, we might be justified in saying "we can't see the variance within groups, but we can estimate it from the residual variance". That would be equivalent to assuming constant variance, and my n/(s^2) reduces to n except for a scaling factor. Using n alone would then be consistent with one's assumptions, I think. On the kind of data I get, though (mostly chemical measurement with continuous scales), I'd have considerable difficulty justifying that assumption. And if I didn't have that kind of data (or a reasonable approximation thereto) I'd be wondering whether I should be using linear regression at all. 


This email and any attachments are confidential. Any use, co...{{dropped}}

More information about the R-help mailing list