[R] Get variable names from results of lm()

Thu May 24 02:48:44 CEST 2012

On May 23, 2012, at 2:52 PM, arun wrote:

> Hi Marc,
> 
> Just to point out some difference,
> 
> 
>  x <- 1:20
>   y <-  x + (x/4 - 2)^3 + rnorm(20, sd=3)
>       names(y) <- paste("O",x,sep=".")
>        ww <- rep(1,20); ww[13] <- 0
>       summary(lmxy <- lm(y ~ x + I(x^2)+I(x^3) + I((x-10)^2),
>                           weights = ww), cor = TRUE)
> 
> 
>> all.vars(formula(lmxy))
> [1] "y" "x"
> 
> 
>> variable.names(lmxy)
> [1] "(Intercept)" "x"           "I(x^2)"      "I(x^3)"   
> 
> 
> 
> A.K.

<snip>

Hi Arun,

Note that as long as the model terms are not factors (and other terms that get 'expanded'), the above will return the names of the terms, plus of course the intercept. I suspect however, that in your example, you might want:

> variable.names(lmxy, full = TRUE)
[1] "(Intercept)"   "x"             "I(x^2)"        "I(x^3)"       
[5] "I((x - 10)^2)"

since the last term was dropped in your output. Note that you would get essentially the same information from:

> names(coef(lmxy))
[1] "(Intercept)"   "x"             "I(x^2)"        "I(x^3)"       
[5] "I((x - 10)^2)"

again, with no factors present.

However, with factors present, consider:

LM <- lm(Sepal.Length ~ ., data = iris)

> all.vars(formula(LM))
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"  

versus:

> variable.names(LM)
[1] "(Intercept)"       "Sepal.Width"       "Petal.Length"     
[4] "Petal.Width"       "Speciesversicolor" "Speciesvirginica" 

That does no better than:

> names(coef(LM))
[1] "(Intercept)"       "Sepal.Width"       "Petal.Length"     
[4] "Petal.Width"       "Speciesversicolor" "Speciesvirginica" 

This is because variable.names() is essentially getting its information from:

> colnames(lmxy$qr$qr)
[1] "(Intercept)"   "x"             "I(x^2)"        "I(x^3)"       
[5] "I((x - 10)^2)"

> colnames(LM$qr$qr)
[1] "(Intercept)"       "Sepal.Width"       "Petal.Length"     
[4] "Petal.Width"       "Speciesversicolor" "Speciesvirginica" 

Other options include:

> labels(terms(lmxy))
[1] "x"             "I(x^2)"        "I(x^3)"        "I((x - 10)^2)"

> labels(terms(LM))
[1] "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"    

which gets the information from the 'term.labels' attribute of the model terms object, which is the RHS:

> attr(terms(LM), "term.labels")
[1] "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species" 

You could also use:

> colnames(model.frame(LM))
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"     

> colnames(model.frame(lmxy))
[1] "y"             "x"             "I(x^2)"        "I(x^3)"       
[5] "I((x - 10)^2)" "(weights)" 

This gives slightly different information, but shows that there is more than one way to get information from an R object, depending upon needs. 

Let me throw in another twist into the mix:

> variable.names(lm(y ~ poly(x, 3)))
[1] "(Intercept)" "poly(x, 3)1" "poly(x, 3)2" "poly(x, 3)3"

> all.vars(formula(lm(y ~ poly(x, 3))))
[1] "y" "x"

> labels(terms(lm(y ~ poly(x, 3))))
[1] "poly(x, 3)"

> colnames(model.frame(lm(y ~ poly(x, 3))))
[1] "y"          "poly(x, 3)"

Which output do you want? That will be dependent upon use case. One needs to be cautious in proposing a generic solution to an underlying problem that needs to be precisely defined.

Food for thought...

Regards,

Marc