[R] splitting into multiple dataframes and then create a loop to work

Mon Aug 29 21:20:46 CEST 2011

Hi:

Dimitris' solution is appropriate, but it needs to be mentioned that
the approach I offered earlier in this thread differs from the
lmList() approach. lmList() uses a pooled measure of error MSE (which
you can see at the bottom of the output from summary(mlis) ), whereas
the plyr approach subdivides the data into distinct sub-data frames
and analyzes them as separate entities. As a result, the residual MSEs
will differ between the two approaches, which in turn affects the
significance tests on the model coefficients. You need to decide which
approach is better for your purposes.

Cheers,
Dennis

On Mon, Aug 29, 2011 at 12:02 PM, Dimitris Rizopoulos
<d.rizopoulos at erasmusmc.nl> wrote:
> You can do this using function lmList() from package nlme, without having to
> split the data frames, e.g.,
>
> library(nlme)
>
> mlis <- lmList(yvar ~ .  - clvar | clvar, data = df)
> mlis
> summary(mlis)
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> On 8/29/2011 5:37 PM, Nilaya Sharma wrote:
>>
>> Dear All
>>
>> Sorry for this simple question, I could not solve it by spending days.
>>
>> My data looks like this:
>>
>> # data
>> set.seed(1234)
>> clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
>> level for this factor var;
>> yvar<-  rnorm(40, 10,6);
>> var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2);
>> var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example
>> df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
>>
>> # manual splitting
>> df1<- subset(df, clvar == 1)
>> df2<- subset(df, clvar == 2)
>> df3<- subset(df, clvar == 3)
>> df4<- subset(df, clvar == 4)
>> df5<- subset(df, clvar == 5)
>>
>> # i tried to mechanize it
>> *
>>
>> for(i in 1:5) {
>>
>>           df[i]<- subset(df, clvar == i)
>>
>> }
>>
>> I know it should not work as df[i] is single variable, do it did. But I
>> could not find away to output multiple dataframes from this loop. My
>> limited
>> R knowledge, did not help at all !
>>
>> *
>>
>> # working on each of variable, just trying simple function
>>  a<- 3:8
>> out1<- lapply(1:5, function(ind){
>>                    lm(df1$yvar ~ df1[, a[ind]])
>>  })
>> p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2])
>> p1<- do.call(rbind, p1)
>>
>>
>> My ultimate objective is to apply this function to all the dataframes
>> created (i.e. df1, df2, df3, df4, df5) and create five corresponding
>> p-value
>> vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
>> correponding p values
>> clvar       var1   var2  var3  var4   var5
>> 1
>> 2
>> 3
>> 4
>>
>> Please help me !
>>
>> Thanks
>>
>> NIL
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
>
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
> Web: http://www.erasmusmc.nl/biostatistiek/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>