[R] Apply same linear model to subset of dataframe

Thu Nov 8 18:19:37 CET 2012

On Nov 8, 2012, at 4:40 AM, Ross Ahmed wrote:

> There is a slight problem with the code. Due to the collapse=Œ+¹ argument,
> the code only works if there are >1 predictor variables. How can I amend
> code so it also works if number of predictor variable==1?
> 
> # WORKING  LENGTH OF EACH SET OF PREDICTOR >1
> 
> DV <- c("mpg", "drat", "gear")
> IV <- list(c("cyl", "disp", "hp"), c("wt", "qsec"), c("carb", "hp"))
> fits <- vector("list", length(DV))
> 
> for(i in seq(DV)) {
>  fit <- lm(formula=paste(DV[i], paste(IV[[i]], collapse="+"), sep="~"),
> data=mtcars) 
>  plot(fit$fitted, fit$resid, main=paste("DV", DV[i], sep="="))
>  lapply(fit$model[, -1], function(x) plot(x, fit$resid))
>  fits[[i]] <- fit 
> }
> 
> # NOT WORKING  LENGTH OF LAST PREDICTOR (CARB) ==1
> 
> DV <- c("mpg", "drat", "gear")
> IV <- list(c("cyl", "disp", "hp"), c("wt", "qsec"), c("carb"))
> fits <- vector("list", length(DV))
> 
> for(i in seq(DV)) {
>  fit <- lm(formula=paste(DV[i], paste(IV[[i]], collapse="+"), sep="~"),
> data=mtcars) 
>  plot(fit$fitted, fit$resid, main=paste("DV", DV[i], sep="="))
>  lapply(fit$model[, -1], function(x) plot(x, fit$resid))

The above line is the culprit .... try instead:

lapply(fit$model[, -1, drop=FALSE], function(x) plot(x, fit$resid))
# ---------------------^^^^^^^^^^^

>  fits[[i]] <- fit 
> }
> 
> Many thanks
> Ross
> 
> From:  Jean V Adams <jvadams at usgs.gov>
> Date:  Tuesday, 6 November 2012 19:20
> To:  Ross Ahmed <rossahmed at googlemail.com>
> Cc:  <r-help at r-project.org>
> Subject:  Re: [R] Apply same linear model to subset of dataframe
> 
> Ross,
> 
> You can store the lm() results in a list, if you like.
> For example:
> 
> DV <- c("mpg", "drat", "gear")
> IV <- list(c("cyl", "disp", "hp"), c("wt", "qsec"), c("carb", "hp"))
> fits <- vector("list", length(DV))
> 
> for(i in seq(DV)) {
>        fit <- lm(formula=paste(DV[i], paste(IV[[i]], collapse="+"),
> sep="~"), data=mtcars)
>        plot(fit$fitted, fit$resid, main=paste("DV", DV[i], sep="="))
>        lapply(fit$model[, -1], function(x) plot(x, fit$resid))
>        fits[[i]] <- fit
>        }
> 
> Jean
> 
> 
> 
> Ross Ahmed <rossahmed at googlemail.com> wrote on 11/06/2012 09:25:13 AM:
>> 
>> Thanks Jean
>> 
>> This works for the plots, but it only stores the last lm() computed
>> 
>> Ross
>> 
>> From: Jean V Adams <jvadams at usgs.gov>
>> Date: Tuesday, 6 November 2012 14:12
>> To: Ross Ahmed <rossahmed at googlemail.com>
>> Cc: <r-help at r-project.org>
>> Subject: Re: [R] Apply same linear model to subset of dataframe
>> 
>> Ross,
>> 
>> Here's one way to condense the code ...
>> 
>> DV <- c("mpg", "drat", "gear")
>> IV <- list(c("cyl", "disp", "hp"), c("wt", "qsec"), c("carb", "hp"))
>> 
>> for(i in seq(DV)) {
>>        fit <- lm(formula=paste(DV[i], paste(IV[[i]], collapse="+"),
>> sep="~"), data=mtcars)
>>        plot(fit$fitted, fit$resid, main=paste("DV", DV[i], sep="="))
>>        lapply(fit$model[, -1], function(x) plot(x, fit$resid))
>>        }
>> 
>> Jean
>> 
>> 
>> 
>> Ross Ahmed <rossahmed at googlemail.com> wrote on 11/04/2012 09:57:34 AM:
>>> 
>>> I have applied the same linear model to several different subsets of a
>>> dataset. I recently read that in R, code should never be repeated.I feel my
>>> code as it currently stands has a lot of repetition, which could be
>>> condensed into fewer lines. I will use the mtcars dataset to replicatewhat
>>> I have done. My question is: how can I use fewer lines of code (for example
>>> using a for loop, a function or plyr) to achieve the same output as below?
>>> 
>>> data(mtcars)
>>> 
>>> # Apply the same model to the dataset but choosing different combinations of
>>> dependent (DV) and independent (IV) variables in each case:
>>> lm.mpg= lm(mpg~cyl+disp+hp, data=mtcars)
>>> lm.drat = lm(drat~wt+qsec, data=mtcars)
>>> lm.gear = lm(gear~carb+hp, data=mtcars)
>>> 
>>> # Plot residuals against fitted values for each model
>>> plot(lm.mpg$fitted,lm.mpg$residuals, main = "lm.mpg")
>>> plot(lm.drat$fitted,lm.drat$residuals, main = "lm.drat")
>>> plot(lm.gear$fitted,lm.gear$residuals, main = "lm.gear")
>>> 
>>> # Plot residuals against IVs for each model
>>> plotResIV <- function (IV,lmResiduals)
>>>  {
>>>  lapply(IV, function (x) plot(x,lmResiduals))
>>> }
>>> 
>>> plotResIV(lm.mpg$model[,-1],lm.mpg$residuals)
>>> plotResIV(lm.drat$model[,-1],lm.drat$residuals)
>>> plotResIV(lm.gear$model[,-1],lm.gear$residuals)
>>> 
>>> Many thanks
>>> Ross Ahmed
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA