[R] splitting into multiple dataframes and then create a loop to work

Dennis Murphy djmuser at gmail.com
Mon Aug 29 20:54:58 CEST 2011


Hi:

This is straightforward to do with the plyr package:

# install.packages('plyr')
library('plyr')
set.seed(1234)
df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6),
                 var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4),
                 var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3),
                 var5 = rnorm(40, 15, 8))
mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d))
summary(mods[[1]])

mods is a list of model objects, one per subgroup defined by clvar.
You can use extraction functions to pull out pieces from each model,
e.g.,

ldply(mods, function(m) summary(m)[['r.squared']])
ldply(mods, function(m) coef(m))
ldply(mods, function(m) resid(m))

The dlply() function reads a data frame as input and outputs to a
list; conversely, the ldply() function reads from a list and outputs
to a data frame. The functions you call inside have to be compatible
with the input and output data types.

HTH,
Dennis


On Mon, Aug 29, 2011 at 8:37 AM, Nilaya Sharma <nilaya.sharma at gmail.com> wrote:
> Dear All
>
> Sorry for this simple question, I could not solve it by spending days.
>
> My data looks like this:
>
> # data
> set.seed(1234)
> clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
> level for this factor var;
> yvar <-  rnorm(40, 10,6);
> var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5, 2);
> var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example
> df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
>
> # manual splitting
> df1 <- subset(df, clvar == 1)
> df2 <- subset(df, clvar == 2)
> df3<- subset(df, clvar == 3)
> df4<- subset(df, clvar == 4)
> df5<- subset(df, clvar == 5)
>
> # i tried to mechanize it
> *
>
> for(i in 1:5) {
>
>          df[i] <- subset(df, clvar == i)
>
> }
>
> I know it should not work as df[i] is single variable, do it did. But I
> could not find away to output multiple dataframes from this loop. My limited
> R knowledge, did not help at all !
>
> *
>
> # working on each of variable, just trying simple function
>  a <- 3:8
> out1 <- lapply(1:5, function(ind){
>                   lm(df1$yvar ~ df1[, a[ind]])
>  })
> p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2])
> p1 <- do.call(rbind, p1)
>
>
> My ultimate objective is to apply this function to all the dataframes
> created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
> vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
> correponding p values
> clvar       var1   var2  var3  var4   var5
> 1
> 2
> 3
> 4
>
> Please help me !
>
> Thanks
>
> NIL
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list