[R] How to get variable name while doing series of regressions in an automated manner?

Sun Nov 1 23:06:32 CET 2015

Ravi et. al:

My prior "solution" nagged at me, as I thought it was pretty clumsy --
I was hoping someone would show how to fix it up. As no one did, I
finally realized how to do it myself. Here's how to do the iteration
to get the right labeling with no pasting or formula() call  by using
as.name() to substitute via bquote() directly into the (parsed) lm()
call. As one can see, it's a general approach to this sort of thing.
(It's also been offered in the past by others, but I forgot it).

z <- list(y1=rnorm(10,5),y2=rnorm(10,8),x=runif(10))

lapply(names(z)[-3],function(u) {
  eval(bquote(lm(log(.(y)) ~ x, data=z), list(y=as.name(u))))
})

There -- now I feel better. No need to respond.

Cheers,
Bert

Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll

On Tue, Oct 27, 2015 at 10:50 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Marc,Ravi:
>
> I may misunderstand, but I think Marc's solution labels the list
> components but not necessarily the summary() outputs. This might be
> sufficient, as in:
>
>> z <- list(y1=rnorm(10,5),y2 = rnorm(10,8),x=1:10)
>>
>> ##1
>> results1<-lapply(z[-3],function(y)lm(log(y)~x,data=z))
>> lapply(results1,summary)
> $y1
>
> Call:
> lm(formula = log(y) ~ x, data = z)
>
> Residuals:
>     Min      1Q  Median      3Q     Max
> -0.2185 -0.1259 -0.0643  0.1340  0.3988
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  1.69319    0.14375  11.779 2.47e-06 ***
> x           -0.01495    0.02317  -0.645    0.537
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.2104 on 8 degrees of freedom
> Multiple R-squared:  0.04945,    Adjusted R-squared:  -0.06937
> F-statistic: 0.4161 on 1 and 8 DF,  p-value: 0.5369
>
>
> $y2
>
> Call:
> lm(formula = log(y) ~ x, data = z)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.229072 -0.094579 -0.006498  0.134303  0.188158
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)  2.084846   0.104108  20.026 4.03e-08 ***
> x           -0.006226   0.016778  -0.371     0.72
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1524 on 8 degrees of freedom
> Multiple R-squared:  0.01692,    Adjusted R-squared:  -0.106
> F-statistic: 0.1377 on 1 and 8 DF,  p-value: 0.7202
>
>
> ## 2
>
> Alternatively, if you want output with the correct variable names,
> bquote() can be used, as in:
>
>> results2 <-lapply(names(z)[1:2],
> +        function(nm){
> +          fo <-formula(paste0("log(",nm,")~x"))
> +           eval(bquote(lm(.(u),data=z),list(u=fo)))
> +        })
>> lapply(results2,summary)
> [[1]]
>
> Call:
> lm(formula = log(y1) ~ x, data = z)
>
> Residuals:
>     Min      1Q  Median      3Q     Max
> -0.2185 -0.1259 -0.0643  0.1340  0.3988
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  1.69319    0.14375  11.779 2.47e-06 ***
> x           -0.01495    0.02317  -0.645    0.537
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.2104 on 8 degrees of freedom
> Multiple R-squared:  0.04945,    Adjusted R-squared:  -0.06937
> F-statistic: 0.4161 on 1 and 8 DF,  p-value: 0.5369
>
>
> [[2]]
>
> Call:
> lm(formula = log(y2) ~ x, data = z)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.229072 -0.094579 -0.006498  0.134303  0.188158
>
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept)  2.084846   0.104108  20.026 4.03e-08 ***
> x           -0.006226   0.016778  -0.371     0.72
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1524 on 8 degrees of freedom
> Multiple R-squared:  0.01692,    Adjusted R-squared:  -0.106
> F-statistic: 0.1377 on 1 and 8 DF,  p-value: 0.7202
>
>
> HTH or apologies if I've missed the point and broadcasted noise.
>
> Cheers,
> Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>    -- Clifford Stoll
>
>
> On Tue, Oct 27, 2015 at 8:19 AM, Ravi Varadhan <ravi.varadhan at jhu.edu> wrote:
>> Hi,
>>
>> I am running through a series of regression in a loop as follows:
>>
>> results <- vector("list", length(mydata$varnames))
>>
>> for (i in 1:length(mydata$varnames)) {
>> results[[i]] <- summary(lm(log(eval(parse(text=varnames[i]))) ~ age + sex + CMV.status, data=mydata))
>> }
>>
>> Now, when I look at the results[i]] objects, I won't be able to see the original variable names.  Obviously, I will only see the following:
>>
>> Call:
>> lm(formula = log(eval(parse(text = varnames[i]))) ~ age + sex + CMV.status,
>>     data = mydata)
>>
>>
>> Is there a way to display the original variable names on the LHS?  In addition, is there a better paradigm for doing these type of series of regressions in an automatic fashion?
>>
>> Thank you very much,
>> Ravi
>>
>> Ravi Varadhan, Ph.D. (Biostatistics), Ph.D. (Environmental Engg)
>> Associate Professor,  Department of Oncology
>> Division of Biostatistics & Bionformatics
>> Sidney Kimmel Comprehensive Cancer Center
>> Johns Hopkins University
>> 550 N. Broadway, Suite 1111-E
>> Baltimore, MD 21205
>> 410-502-2619
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.