[R] probem on merge data

Fri Nov 6 13:19:19 CET 2009

Hi,
So you want to randomly throw away data? Doesn't sound like a good idea to me...

You can get the combined data set using

data3 <- merge(data2, data1, all=TRUE)

>From there it's just a matter of randomly deleting rows in which the
combination of areiad, x1 and x2 are duplicated. I'll leave that to
you, but I encourage you to think about whether this is really what
you want.

-Ista

On Thu, Nov 5, 2009 at 11:34 PM, rusers.sh <rusers.sh at gmail.com> wrote:
> Hi there,
> data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
> data1<-data.frame(data1)
> names(data1)<-c("areaid","x","y","date")
> data1
>
>   areaid   x   y      date
> 1      1 1.2 1.3 3/23/2004
> 2      1 1.5 2.3 3/22/2004
> 3      2 0.2 3.3 4/23/2004
> 4      3 1.5 1.3 5/22/2004
> data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,  1.52,
> 2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
> 1.33),nrow=8,ncol=3,byrow=TRUE)
> data2<-data.frame(data2)
> names(data2)<-c("areaid","x1","y1")
> data2
>
>   areaid x1   y1
> 1      1 1.22 1.32
> 2      1 1.53 2.34
> 3      1 1.21 1.37
> 4      1 1.52 2.35
> 5      2 0.21 3.33
> 6      2 0.23 3.35
> 7      3 1.57 1.31
> 8      3 1.59 1.33
>  Explains the two data. You can treat data1 as case dataset and data2 as
> control dataset,respectively.Note th number of recodes for data2 are 2 times
> as that of data1 for each records,something like 1:2 matched case-control
> study design. I hope to merge data1 and data2. Take areaid=1 as an example.
> >From the two dataset, we can see that data1 has two points(x,y) in areaid=1,
> and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
> have two matched records in data2.I want to randomly select 1/2 points of
> areaid=1 in data2 to link the one record of areaid=1 in the data1, and the
> other 1/2 points of areaid=1 in data2 to link the other record of areaid=1
> in the data1.Actually,the number of records in the same areaid will be over
> 2 in the actual dataset. This is only an example to explain the problem.
> For the cases of areaid=2 or 3,they are a little easier than areaid=1
> because there are only one value in data1.
>  The final results are something like the following dataset.
> areaid x1 y1    date         x  y
> 1  1.22  1.32  3/23/2004   1.2  1.3
> 1  1.53  2.34  3/22/2004   1.2  1.3
> 1  1.21  1.37  3/23/2004   1.5  2.3
> 1  1.52  2.35  3/22/2004   1.5  2.3
> 2  0.21  3.33  4/23/2004   0.2  3.3
> 2  0.23  3.35  4/23/2004   0.2  3.3
> 3  1.57  1.31  5/22/2004   1.5  1.3
> 3  1.59  1.33  5/22/2004   1.5  1.3
>
>   Any suggestions or help are greatly appreciated.
>  Thanks a lot.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org