[R] apply lm function to dataset split by two variables

Dennis Murphy djmuser at gmail.com
Wed Sep 28 12:56:32 CEST 2011


Hi:

Here's one way to do it with the plyr package:

dd <- read.table(textConnection("
year    sps     cm      w
2009    50      16      22
2009    50      17      42
2009    50      18      45
2009    51      15      45
2009    51      16      53
2009    51      17      73
2010    50      15      22
2010    50      16      41
2010    50      16      21
2010    50      17      36
2010    51      15      43
2010    51      16      67
2010    51      17      79"), header = TRUE)
closeAllConnections()

library('plyr')

# Input a data frame, output a list of lm objects
modlist <- dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d))

# For use in plyr's ldply() function, the utility function should
# return a data frame. We save some effort in simple linear regression
# by noting that the two-sided p-value of the t-test of zero slope is the
# same as that of the overall F test:
extractfun <- function(m) {
       cf <- coef(m)
       tinfo <- summary(m)$coefficients[2, c(2, 4)]
       r2 <- summary(m)$r.squared
       data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)),
                  slope.se = tinfo[1], pval = tinfo[2], Rsq = r2)
   }

# Take a list (of models) as input and output a data frame:
ldply(modlist, extractfun)

  sps year intercept slope n slope.se      pval       Rsq
1  50 2009 -159.1667  11.5 3 4.907477 0.2567749 0.8459488
2  50 2010  -82.0000   7.0 4 7.141428 0.4303481 0.3245033
3  51 2009 -167.0000  14.0 3 3.464102 0.1544210 0.9423077
4  51 2010 -225.0000  18.0 3 3.464102 0.1210377 0.9642857

HTH,
Dennis

On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro
<elena.guijarro at vi.ieo.es> wrote:
>
> Dear all,
>
> I am not fluent in R and am struggling to 1) apply a lm to a weight-size
> dataset, thus the model has to run separately for each species, each
> year; 2) extract coefs, r-squared, n, etc. The data look like this:
>
> year    sps     cm      w
> 2009    50      16      22
> 2009    50      17      42
> 2009    50      18      45
> 2009    51      15      45
> 2009    51      16      53
> 2009    51      17      73
> 2010    50      15      22
> 2010    50      16      41
> 2010    50      16      21
> 2010    50      17      36
> 2010    51      15      43
> 2010    51      16      67
> 2010    51      17      79
>
>
>
> The following script works for data from a single year, but I don't find
> a way to subset the data by sps AND year and get the function running:
>
> f <- function(data) lm(log(w) ~ log(cm+0.5), data = data)
> v <- lapply(split(data, data$sps), f)
>
> and then I can extract the data with this script from Peter Solymos
> (although I do not get the number of points used in the analysis):
>
> myFun <-
> function(lm)
> {
> out <- c(lm$coefficients[1],
>     lm$coefficients[2],
>     length(lm$run1$model$y),
>     summary(lm)$coefficients[2,2],
>     pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
> summary(lm)$fstatistic[3], lower.tail = FALSE),
>     summary(lm)$r.squared)
> names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared")
> return(out)}
>
> results <- list()
> for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]])
> as.data.frame(results)
>
> I have checked the plyr package, but the example that fits my data best
> uses a for loop and I would like to avoid these. I have also tried the
> following (among many other options) without results:
>
> v<-tapply(data$w,list(data$cm,data$year),f)
>
> Error in is.function(FUN) : 'FUN' is missing
>
> Any ideas?
>
> Thanks for your help,
>
> Elena
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list