[R] How to Un-group a grouped data set?

R. Michael Weylandt michael.weylandt at gmail.com
Tue May 15 08:08:30 CEST 2012


It is a nifty and surprisingly useful construct whenever you need to
construct a function call programmatically or apply it to a list.

R-News 2/2 has some useful tips on this and related functions in the
Programmer's Note section if you're interested.

Best,
Michael

On Tue, May 15, 2012 at 2:05 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
> Thank you so much!  I can't believe I spent the whole night by not knowing
> this one command "do.call"
> This is so handy!
> Best, Koh
>
>
> On Tue, May 15, 2012 at 12:52 AM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Sorry -- I missed the bit about the AE in your original post. Perhaps
>> you can work with my bit for the repeats, but it looks like if you
>> want to use your function, it should suffice to do something like
>>
>> do.call("rbind", lapply(NewFuncName, 1:6))
>>
>> Best,
>> Michael
>>
>> On Tue, May 15, 2012 at 1:50 AM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>> > Don't use subset for a function name -- it's already the name of a
>> > rather important function as is data (but at least that one's not a
>> > function in your use so it's not quite so bad). Finally, use dput()
>> > when sending data so we get a plaintext reproducible version.
>> >
>> > I'd try something like this:
>> >
>> > dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
>> > 0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
>> > 7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
>> > "data.frame", row.names = c("1",
>> > "2", "3", "4", "5", "6"))
>> >
>> > # See how handy dput can be :-)
>> >
>> > dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats),
>> > dats$N)), -4]
>> >
>> > which isn't super elegant, but others might have something better.
>> >
>> > Best,
>> > Michael
>> >
>> > On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com>
>> > wrote:
>> >> Hello, R-fellows,
>> >>
>> >> I have a question that I really don't know how to solve. I have spent
>> >> hours
>> >> on line surfing for possible solutions but in veil. Please if anyone
>> >> could
>> >> help me handle this issue, you would be so appreciated!
>> >>
>> >> I have a "grouped" dataset like this:
>> >>
>> >>> data
>> >>  Study TX AEs   N
>> >> 1     1     1    3       5
>> >> 2     1     0    2       7
>> >> 3     2     1    1      10
>> >> 4     2     0    2       7
>> >> 5     3     1    1       8
>> >> 6     3     0    1       4
>> >>
>> >> where Study is the study id, TX is treatment, AEs is how many people in
>> >> this trial is positive, and N is the number of the subjects. Therefore,
>> >> for
>> >> the row 1, it stands for: It is the treatment arm for the study one,
>> >> where
>> >> there are 5 subjects and 3 of them are positive. The row 2 stands for:
>> >> It
>> >> is the control arm of the study 1 where there are 7 subjects and 2 of
>> >> them
>> >> are positive.
>> >>
>> >> Now I would like to "un-group them", make it like:
>> >>
>> >> Study  TX   AEs
>> >>   1         1      1
>> >>   1         1      1
>> >>   1         1      1
>> >>   1         1      0
>> >>   1         1      0
>> >>   1         0      1
>> >>   1         0      1
>> >>   1         0      0
>> >>   1         0      0
>> >>   1         0      0
>> >>   1         0      0
>> >>   1         0      0
>> >>   2         1      1
>> >>   .....................
>> >>  .....................
>> >>
>> >>
>> >> But I wasn't able to do it. In fact I wrote a small function, and use
>> >> "lapply" to get what I want. It worked well, and did give me what I
>> >> want.
>> >> But I wasn't able to collapse all the returns into one single data
>> >> frame
>> >> for subsequent analysis.
>> >>
>> >> The function I wrote:
>> >>
>> >> subset = function(i){
>> >> d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
>> >> c(data[i,4] - data[i,3],data[i,3])))
>> >> d = matrix(d, data[i,4],3)
>> >> d
>> >> }
>> >>
>> >> then:
>> >>
>> >> Data = lapply(1:6, subset)
>> >> Data
>> >>
>> >> Therefore, I tried to write a loop. But no matter how I tried, I can't
>> >> get
>> >> what I want.
>> >>
>> >> Any idea?
>> >>
>> >> Thank you so much!
>> >>
>> >> Best,
>> >>
>> >>
>> >> --
>> >> Cheenghee Masaki Koh, MSW, MS(c), PhD Student
>> >> School of Social Service Administration
>> >> Department of Health Studies, Division of Biological Science
>> >> University of Chicago
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Cheenghee Masaki Koh, MSW, MS(c), PhD Student
> School of Social Service Administration
> Department of Health Studies, Division of Biological Science
> University of Chicago
>



More information about the R-help mailing list