[R] Transformation of Y changes the 'lm' object?

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Jul 20 11:30:12 CEST 2010


On Tue, 2010-07-20 at 13:16 +0530, Shubha Vishwanath Karanth wrote:
> Hi R,
> 
>  
> 
> This is a problem, which I have tried to present in a simple way:
> 
>  
> 
> Let,

> x1=1:10
> 
> x2=2:11
> 
> y=2+3*x1
> 
> lm_obj=lm(y~x1+x2)
> 
> lm_obj
> 
> step(lm_obj) # Step function for the first time

> y=y^0.1
> 
> lm_obj
> 
> step(lm_obj) #Step function after a transformation on Y, but 'lm_obj'
> is not modified.

You didn't modify lm_obj so it can't change.

> The two step function behave differently. The first one is before the
> transformation on 'Y', the dependent variable and the second step
> function is after the transformation. But, please note that I have NOT
> changed the 'lm_obj' at all after the transformation at all. So, I was
> wondering, since I have not changed the 'lm_obj', I should get both
> stepwise results as the same. Or does the transformation of Y, changes
> the 'lm_obj' without actually specifying?

I don't know exactly why step changes, but it is because your variables
are in the workspace not an explicitly defined object. Step update()s
the model fitted in lm_obj. As you told it (implicitly) to look in the
workspace for y, x1, x2 and changed y between the two runs of step it
does not surprise me that the results changed. Why this happens probably
has something to do with environments attached to the model formula or
terms object of the fitted model, but I don't know the specifics of this
to venture further explanation.

You can avoid this by putting your data into a data frame prior to model
fitting.

x1 <- 1:10
x2 <- 2:11
y <- 2 + 3 * x1
d <- data.frame(x1 = x1, x2 = x2, y = y)
lmObj <- lm(y ~ x1 + x2)
lmObj2 <- lm(y ~ x1 + x2, data = d)
lmObj
lmObj2

## step them
step(lmObj)
step(lmObj2)

## modify y
y <- y^0.1

## step them again
step(lmObj)  ## this changes as you observed
step(lmObj2) ## this doesn't change

HTH

G

Ps; I find it interesting, and slightly perverse, that you space your
individual lines of code out with repeated carriage returns but squash
the code *within* lines up allowing no whitespace. That really doesn't
help readability.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list