# [R] lm for log log

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Jun 21 03:47:23 CEST 2010

```On 21-Jun-10 01:30:26, David Winsemius wrote:
> On Jun 20, 2010, at 9:14 PM, David Winsemius wrote:
>> On Jun 20, 2010, at 8:17 PM, (Ted Harding) wrote:
>>
>>> On 20-Jun-10 19:54:02, David Winsemius wrote:
>>>> On Jun 20, 2010, at 1:38 PM, Ekaterina Pek wrote:
>>>>> Hi, Ted.
>>>>> Thanks for your reply. It helped. I have further a bit of
>>>>> questions.
>>>>>
>>>>>> It may be that lm(log(b) ~ log(a)) is, from a substantive point of
>>>>>> view, a more appropriate model for whetever it is than lm(b ~ a).
>>>>>> Or it may not be. This is a separate question. Again, Spearman's
>>>>>> rho is not definitive.
>>>>>
>>>>> How one determines if one linear model is more appropriate than
>>>>> another ?
>>>>> And : linear model "log(b) ~ log(a)" is okay ? I hesitated to use
>>>>> such
>>>>> thing from the beginning, because it seemed to me like it would
>>>>> have
>>>>> meant a nonlinear model rather than linear.. (Sorry, if the
>>>>> question
>>>>> is stupid, I'm not that good at statistics)
>>>>
>>>> Your earlier description of the plots made me think both "a" and "b"
>>>> were right-skewed. Such a situation (if my interpretation were
>>>> correct) would seriously undermine the statistical validity of an
>>>> analysis like lm(a ~ b) .
>>>> --
>>>> David Winsemius, MD
>>>
>>> That doesn't follow. If b is linearly related to a: b = A + B*a +
>>> error,
>>> and if the distribution of a is highly skewed, then so also will be
>>> the distribution of b, even if the error is a nice Gaussian error
>>> with constant variance (and small compared with the dispersion
>>> of a & b).
>>
>> Yes, but that was not what was suggested in the OP's description of
>> the scatterplot of a and b.
>
> Or rather I should say that is not the data picture that came to mind.
> Your theory can be visualized as:
>
>  > a <- rlnorm(3000)
>  > b <- 1 + 2*a +rnorm(3000)
>  > plot(a,b)
>
> Mine was a more heteroskedastic picture:
>  > a <- rlnorm(3000)
>  > b <- rlnorm(3000)
>  > plot(a,b)
>
> --
> David Winsemius, MD

And of course either is possible, given Ekaterina's description:

There is some linear correlation but the picture "plot(a,b)
+ abline(lm(b~a))" is quite crowded in the left lower corner.
The picture "plot(log(a), log(b)) + abline(lm(log(b)~log(a))"
is much nicer ("Milky Way").

Your visualisation of "my theory" is pretty wel exactly the sort
of thing I had in mind for "lm(b ~ a)".

The only thing that might suggest that your picture, or perhaps
rather a power-law picture lm(log(b) ~ log(a)) is the indication
"Milky Way" which, if likened to the narrow band of galactic
starts which has pretty much constant width, would suggest that
the dispersion (in log(b)) is roughly independent of a, and hence
that, on the raw scale, the dispersion in b would increase with a.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Jun-10                                       Time: 02:47:18
------------------------------ XFMail ------------------------------

```