[R] ddply (or other suitable solution) question

Fri Sep 14 16:19:53 CEST 2018

thank you all, Bert's idea will get it done... good question also re what if 1 row: have a separate plan for that... Anyhow, finishing up Bert's lines with 
z<-lapply(ix, function(i)   df[i,])
lapply(z, function(x) split(x, rep(1:ceiling(nrow(x)/2), each=2)[1:nrow(x)]))

seems to do what I need,
thanks again...

Andras  

    On Thursday, September 13, 2018, 5:16:54 PM EDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:  

 Mod my earlier question, it seems that you just want to replicate all
rows within an id if there more than 2 rows. If this is incorrect,
ignore the rest of this post.

Otherwise...

(I assume the data frame is listed in ID order, whatever that is)

set.seed(123.456)
df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
                z=rnorm(13,1,5),
                y=rnorm(13,1,5))

yielded on my Mac and R version 3.5.1

> df
  ID read int          z          y
1  1    1  1 -1.8023782  1.55341358
2  1    1  1 -0.1508874 -1.77920567
3  2    0  0  8.7935416  9.93456568
4  2    1  0  1.3525420  3.48925239
5  2    1  0  1.6464387 -8.83308578
6  3    1  1  9.5753249  4.50677951
7  3    0  1  3.3045810 -1.36395704
8  3    0  0 -5.3253062 -4.33911853
9  3    0  0 -2.4342643 -0.08987457
10  4    1  1 -1.2283099 -4.13002224
11  4    0  1  7.1204090 -2.64445615
12  5    0  1  2.7990691 -2.12519634
13  5    0  1  3.0038573 -7.43346655

## The following doubles up the rows by ID
> ix <- tapply(seq_len(nrow(df)),df$ID,
+              function(x){
+                lenx <- length(x)
+                if(lenx > 2)
+                    c(x[1],rep(x[2]:x[lenx-1],e=2),x[lenx])
+                else x
+              }
+    )
> ix
$`1`
[1] 1 2

$`2`
[1] 3 4 4 5

$`3`
[1] 6 7 7 8 8 9

$`4`
[1] 10 11

$`5`
[1] 12 13

## now use the ix list to break up df:

> lapply(ix, function(i)df[i,])
$`1`
  ID read int          z        y
1  1    1  1 -1.8023782  1.553414
2  1    1  1 -0.1508874 -1.779206

$`2`
    ID read int        z        y
3    2    0  0 8.793542  9.934566
4    2    1  0 1.352542  3.489252
4.1  2    1  0 1.352542  3.489252
5    2    1  0 1.646439 -8.833086

$`3`
    ID read int        z          y
6    3    1  1  9.575325  4.50677951
7    3    0  1  3.304581 -1.36395704
7.1  3    0  1  3.304581 -1.36395704
8    3    0  0 -5.325306 -4.33911853
8.1  3    0  0 -5.325306 -4.33911853
9    3    0  0 -2.434264 -0.08987457

$`4`
  ID read int        z        y
10  4    1  1 -1.228310 -4.130022
11  4    0  1  7.120409 -2.644456

$`5`
  ID read int        z        y
12  5    0  1 2.799069 -2.125196
13  5    0  1 3.003857 -7.433467

I leave it to you to modify the lapply() function to break up each id
data frame into sublists of pairs if that is what you wish to do.
Assuming again that this is actually what you want.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Sep 13, 2018 at 1:40 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> What if there is only one read in the id?
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Thu, Sep 13, 2018 at 12:11 PM Andras Farkas via R-help
> <r-help using r-project.org> wrote:
> >
> > Dear All,
> >
> > I have data frame:
> > set.seed(123.456)
> > df <-data.frame(ID=c(1,1,2,2,2,3,3,3,3,4,4,5,5),
> >                read=c(1,1,0,1,1,1,0,0,0,1,0,0,0),
> >                int=c(1,1,0,0,0,1,1,0,0,1,1,1,1),
> >                z=rnorm(13,1,5),
> >                y=rnorm(13,1,5))
> >
> > what I would like to achieve (as best as I see it now) is to create multiple lists (and lists within lists using the data in df) that would be based on the groups in the ID column ("top level of list") and "join together" each line item within the group followed by the next line item ("bottom level list"), so would look like this for
> >
> > [[ID=1]]
> > [[1]][[1]]
> >  ID read int        z        y
> >  1    1  1 5.188935 5.107905
> >  1    1  1 1.766866 4.443201
> > [[ID=2]]
> > [[2]][[1]]  ID read int        z        y
> >  2    0  0 -4.690685 3.7695883
> >  2    1  0  7.269075 0.6904414[[ID=2]]
> > [[2]][[2]]  ID read int        z          y
> >  2    1  0 7.269075  0.6904414
> >  2    1  0 3.132321 -0.5298133[[ID=3]]
> > [[3]][[1]]  ID read int          z        y
> >  3    1  1 -0.4753574 -0.902355
> >  3    0  1  5.4756283 -2.473535
> > [[ID=3]]
> > [[3]][[2]]
> >  3    0  1 5.475628 -2.47353489
> >  3    0  0 5.390667 -0.03958639
> >
> >
> > hoping example clear enough... all our help is appreciated,
> >
> > thanks,
> >
> >
> >
> > Andras
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.  
	[[alternative HTML version deleted]]