[R] Matched pairs with two data frames

Udo ukoenig at med.uni-marburg.de
Thu Apr 17 22:04:19 CEST 2008


Daniel,
thank you!

I want to perfrom the simplest way of matching:
a one-to-one exact match (by age and school):
for every case in "treat" find ONE case (if there is one) in "control" .
The cases in "control" that could be matched, should be tagged as
not available or taken away (deleted) from the control pool (thus,
the used ones are not replaced).

#treatment group
treat <- data.frame(age=c(1,1,2,2,2,4),
                    school=c(10,10,20,20,20,11),
                    out1=c(9.5,2.3,3.3,4.1,5.9,4.6))

#control group
control <- data.frame(age=c(1,1,1,1,3,2),
                      school=c(10,10,10,10,33,20),
                      out2=c(1.1,2,3.5,4.9,5.2,6.5))

#one-to-one exat matching-alorithmus ????

matched.data.frame <- ?????

In my example I matched the cases "by hand" to make things clear.
Case 1 from "treat" was matched with case 1 from "control",
2 with 2 and 3 with 6. Case 4, 5 and 6 could not be matched,
because there is no "partner" in "control" .
Thus my matched example data frame has 3 cases.


Regards
Udo



Zitat von Daniel Malter <daniel at umd.edu>:

> Hi, sorry for jumping in here, but to me your description of why you want to
> have only the needed data rows remains ambiguous.
>
> If you just want to select the data you indicate then you do:
>
> selected.data=data[ , needed=="yes"]
>
> where "data" is the name of your long dataset (13 obs). As I see it, you are
> just selecting data rows from the whole data set; you are not merging or
> unstacking data in any way.
>
> If that is true, it would be more helpful to know why it is lines 1, 6, and
> 9 that you need and not the others. That is, is there a systematic reason
> for "yes" and "no" in the "needed" variable - a reason that could be coded?
> Or is it just your (arbitrary) selection? This is the part of your questions
> that I completely do not understand. If the answer to the aforementioned
> question is yes, there is a reason, then we would need to know the criteria
> on which to base the coding for this variable. What makes a yes a yes and a
> no a no?
>
> Cheers,
> Daniel
>
>
>
>
> -------------------------
> cuncta stricte discussurus
> -------------------------
>
> -----Ursprüngliche Nachricht-----
> Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
> Auftrag von Udo
> Gesendet: Wednesday, April 16, 2008 5:59 AM
> An: r-help at r-project.org
> Betreff: Re: [R] Matched pairs with two data frames
>
> Patrick,
> my intention was, to perform a one-to-one exact match, which pairs each
> treated unit with ONE control unit (without replacement), using my two
> confounders (age, school) for matching.
>
>
> Patrick Connolly schrieb:
> On Mon, 14-Apr-2008 at 08:37AM +0200, Udo wrote:
>
> |> Zitat von Peter Alspach <PAlspach at hortresearch.co.nz>:
> |>
> |> > Udo
> |> >
> |> > Seems you might want merge()
> |> >
> |> > HTH .......
> |> >
> |> > Peter Alspach
> |>
> |> Thank you Peter and Jorge,
> |>
> |> but as I had written in my last sentence, "Merge doesn´t do the job,
> |> because it makes all possible matches", but maybe there is a
> |> sophisticated solution with "merge", I could not bring light to.
>
> >Maybe it would help if we knew what you mean by 'all' in this context.
> >To get the NAs in your example, it is NECESSARY to use the all = TRUE
> >argument.  Without the all = TRUE, the NA rows are omitted.
>
> With 'all' I mean, that in the merged data frame (13 Obs) there are 8 cases
> (2*4) with age=1 and school=10 (all possible combinations).
>
> >What is it that you don't want in this:
> I only "need" line 1, 6 and 9. To show this, I added "needed" by hand.
>
>    age school out1 out2	     needed
> 1    1     10  9.5  1.1      yes
> 2    1     10  9.5  2.0	     no
> 3    1     10  9.5  3.5	     no
> 4    1     10  9.5  4.9	     no
> 5    1     10  2.3  1.1	     no
> 6    1     10  2.3  2.0	     yes
> 7    1     10  2.3  3.5	     no
> 8    1     10  2.3  4.9	     no
> 9    2     20  3.3  6.5	     yes
> 10   2     20  4.1  6.5	     no
> 11   2     20  5.9  6.5	     no
> 12   3     33   NA  5.2	     no
> 13   4     11  4.6   NA	     no
>
> >Whatever it is, can't you subset them out?
> Yes, that´s the problem. To describe what I mean, I added the variable
> “needed”
> by hand. I don´t know how to compute such a variable to subset.
>
>
> My final data frame should look like this:
>     age school out1 out2	nedded
> 1    1     10  9.5  1.1 	yes
> 6    1     10  2.3  2.0	        yes
> 9    2     20  3.3  6.5	        yes
>
> I hope, I could make clear, what the problem ist and waht I mean.
>
> An alternative would be using packages like “Matching” or “MatchIt”, which
> need a “long” data structure with one data frame and not a “wide” one with
> two data frames.
>
>
> Many thanks!
> Udo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



--------------------------------------------
Udo K    N     G
      Ö     I

Clinic for Child an Adolescent Psychiatry
Philipps University of Marburg / Germany



More information about the R-help mailing list