[R] How to get variable name while doing series of regressions in an automated manner?

Ravi Varadhan ravi.varadhan at jhu.edu
Tue Oct 27 19:27:10 CET 2015


Thank you very much, Marc & Bert.

Bert - I think you're correct.  With Marc's solution, I am not able to get the response variable name in the call to lm().  But, your solution works well.

Best regards,
Ravi

-----Original Message-----
From: Bert Gunter [mailto:bgunter.4567 at gmail.com] 
Sent: Tuesday, October 27, 2015 1:50 PM
To: Ravi Varadhan <ravi.varadhan at jhu.edu>
Cc: r-help at r-project.org
Subject: Re: [R] How to get variable name while doing series of regressions in an automated manner?

Marc,Ravi:

I may misunderstand, but I think Marc's solution labels the list components but not necessarily the summary() outputs. This might be sufficient, as in:

> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10)
>
> ##1
> results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z))
> lapply(results1,summary)
$y1

Call:
lm(formula = log(y) ~ x, data = z)

Residuals:
    Min      1Q  Median      3Q     Max
-0.2185 -0.1259 -0.0643  0.1340  0.3988

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.69319    0.14375  11.779 2.47e-06 ***
x           -0.01495    0.02317  -0.645    0.537
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2104 on 8 degrees of freedom
Multiple R-squared:  0.04945,    Adjusted R-squared:  -0.06937
F-statistic: 0.4161 on 1 and 8 DF,  p-value: 0.5369


$y2

Call:
lm(formula = log(y) ~ x, data = z)

Residuals:
      Min        1Q    Median        3Q       Max
-0.229072 -0.094579 -0.006498  0.134303  0.188158

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.084846   0.104108  20.026 4.03e-08 ***
x           -0.006226   0.016778  -0.371     0.72
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1524 on 8 degrees of freedom
Multiple R-squared:  0.01692,    Adjusted R-squared:  -0.106
F-statistic: 0.1377 on 1 and 8 DF,  p-value: 0.7202


## 2

Alternatively, if you want output with the correct variable names,
bquote() can be used, as in:

> results2 <-lapply(names(z)[1:2],
+        function(nm){
+          fo <-formula(paste0("log(",nm,")~x"))
+           eval(bquote(lm(.(u),data=z),list(u=fo)))
+        })
> lapply(results2,summary)
[[1]]

Call:
lm(formula = log(y1) ~ x, data = z)

Residuals:
    Min      1Q  Median      3Q     Max
-0.2185 -0.1259 -0.0643  0.1340  0.3988

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.69319    0.14375  11.779 2.47e-06 ***
x           -0.01495    0.02317  -0.645    0.537
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2104 on 8 degrees of freedom
Multiple R-squared:  0.04945,    Adjusted R-squared:  -0.06937
F-statistic: 0.4161 on 1 and 8 DF,  p-value: 0.5369


[[2]]

Call:
lm(formula = log(y2) ~ x, data = z)

Residuals:
      Min        1Q    Median        3Q       Max
-0.229072 -0.094579 -0.006498  0.134303  0.188158

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.084846   0.104108  20.026 4.03e-08 ***
x           -0.006226   0.016778  -0.371     0.72
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1524 on 8 degrees of freedom
Multiple R-squared:  0.01692,    Adjusted R-squared:  -0.106
F-statistic: 0.1377 on 1 and 8 DF,  p-value: 0.7202


HTH or apologies if I've missed the point and broadcasted noise.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge is certainly not wisdom."
   -- Clifford Stoll


On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:
> Hi,
>
> I am running through a series of regression in a loop as follows:
>
> results <- vector("list", length(mydata$varnames))
>
> for (i in 1:length(mydata$varnames)) { results[[i]] <- 
> summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + 
> CMV.status, data=mydata)) }
>
> Now, when I look at the results[i]] objects, I won't be able to see the original variable names.  Obviously, I will only see the following:
>
> Call:
> lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status,
>     data = mydata)
>
>
> Is there a way to display the original variable names on the LHS?  In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion?
>
> Thank you very much,
> Ravi
>
> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg) 
> Associate Professor,  Department of Oncology Division of Biostatistics 
> & Bionformatics Sidney Kimmel Comprehensive Cancer Center Johns 
> Hopkins University
> 550 N. Broadway, Suite 1111-E
> Baltimore, MD 21205
> 410-502-2619
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list