[R] log-transformed linear regression

Mike Marchywka marchywka at hotmail.com
Thu Nov 11 16:59:44 CET 2010





I had a few more thoughts but briefly it almost 
looks like exp at low X and lin at higher X may
be something to test. Anyway, I'd be hard pressed to
see how a linear fit, that you are happy to turn to
log-log fit, could give you "a" number of any relevance
except maybe over a few limited ranges.

> plot(X,log(Y))
> sel=(X>2)
> plot(X[sel],log(Y[sel]))
> sel=(X<2)
> plot(X[sel],log(Y[sel]))

I had some longer message ideas but you can keep running models on
selected subsets of your data and see if that one point is your
real problem. And, above all, I wouldn't try to find an error measure
that just helps you match prior literature numbers found via a different method...


----------------------------------------
> From: marchywka at hotmail.com
> To: sa.cizmeli at usherbrooke.ca; r-help at r-project.org
> Date: Thu, 11 Nov 2010 08:03:57 -0500
> Subject: Re: [R] log-transformed linear regression
>
>
>
>
>
>
> ----------------------------------------
> > Date: Wed, 10 Nov 2010 19:27:20 -0500
> > From: sa.cizmeli at usherbrooke.ca
> > To: r-help at r-project.org
> > Subject: Re: [R] log-transformed linear regression
> >
> > Dear List,
> >
> > I would like to take another chance and see if there if someone has
> > anything to say to my last post...
>
> ok, I took a look at it to help myself re-learn some R so
> I solicit better alts from R users in comments below.
>
> If I expand scale down where you have lots of points, the visual
> is something like a plateau for x<1 and then a general noisy
> trend upward- the slope you get for the aggregate blob of
> data probably depends on how your sampling is weighted. I wouldn't
> worry about the one point at large X in isolation quite yet but
> you may want to consider a piecewise linear or other way to handle
> the low X valued data. You may just want to spend more time staring
> at pictures before churning out p-values LOL.
>
> To be clear about your log issues, lm will fit a linear model to the stuff you
> give it and of course y=a*x means something different from log(y) =a*log(x).
> Overall, you need to figure out what to do with the models depending on
> what you think is a candidate for being real- indeed your points at large
> X and Y may be out of the linear range for this "system" but who knows.
>
> If you just try to fit a bunch of data to a line,
> the slope of course depends on which part of this bunch you have
> sampled. Generally to explore sensitivity, try things like this,
> with ".0" of course replaced by whatever threshold you want.
> It is easy to automate, put this in a loop and plot historgrams
> of the resultant slopes etc.
>
> > sel=runif(length(X))>.0
> > aasel=lm(Y[sel]~X[sel])
> > summary(aasel)
>
> Call:
> lm(formula = Y[sel] ~ X[sel])
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.221957 -0.004207  0.004055  0.013395  0.232362
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.0256642  0.0033254  -7.718 1.54e-12 ***
> X[sel]       0.0463599  0.0003146 147.365  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.0399 on 150 degrees of freedom
> Multiple R-squared: 0.9931,     Adjusted R-squared: 0.9931
> F-statistic: 2.172e+04 on 1 and 150 DF,  p-value: < 2.2e-16
>
> if you increase ".0" to something like .5 or so and keep repating
> this you get some idea of what can happen, make changes for the various
> models and then decide what you are trying to do etc.
>
>
>
> > sel=runif(length(X))>.9
> > aasel=lm(Y[sel]~X[sel])
> > summary(aasel)
>
> Call:
> lm(formula = Y[sel] ~ X[sel])
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.199943 -0.002839  0.010626  0.019608  0.149099
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.035546   0.013741  -2.587   0.0162 *
> X[sel]       0.052033   0.003401  15.301 7.04e-14 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.05878 on 24 degrees of freedom
> Multiple R-squared: 0.907,      Adjusted R-squared: 0.9031
> F-statistic: 234.1 on 1 and 24 DF,  p-value: 7.037e-14
>
>
> >
> > bump
> >
> > servet
> >
> >
> > On 11/10/2010 01:11 PM, servet cizmeli wrote:
> > > Hello,
> > >
> > > I have a basic question. Sorry if it is so evident....
> > >
> > > I have the following data file :
> > > http://ekumen.homelinux.net/mydata.txt
> > >
> > > I need to model Y~X-1 (simple linear regression through the origin) with
> > > these data :
> > >
> > > load(file="mydata.txt")
> > > X=k[,1]
> > > Y=k[,2]
> > >
> > > aa=lm(Y~X-1)
> > > dev.new()
> > > plot(X,Y,log="xy")
> > > abline(aa,untf=T)
> > > abline(b=0.0235, a=0,col="red",untf=T)
> > > abline(b=0.031, a=0,col="green",untf=T)
> > >
> > > Other people did the same kind of analysis with their data and found the
> > > regression coefficients of 0.0235 (red line) and 0.031 (green line).
> > >
> > > Regression with my own data, though, yields a slope of 0.0458 (black
> > > line) which is too high. Clearly my regression is too much influenced by
> > > the single point with high values (X>100). I would not like to discard
> > > this point, though, because I know that the measurement is correct. I
> > > just would like to give it less weight...
> > >
> > > When I log-transform X and Y data, I obtain :
> > >
> > > dev.new()
> > > plot(log10(X),log10(Y))
> > > abline(v=0,h=0,col="cyan")
> > > bb=lm(log10(Y)~log10(X))
> > > abline(bb,col="blue")
> > > bb
> > >
> > > I am happy with this regression. Now the slope is at the log-log domain.
> > > I have to convert it back so that I can obtain a number comparable with
> > > the literature (0.0235 and 0.031). How to do it? I can't force the
> > > second regression through the origin as the log-transformed data does
> > > not go through the origin anymore.
> > >
> > > at first it seemed like an easy problem but I am at loss :o((
> > > thanks a lot for your kindly help
> > > servet
> > >
mented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list