[R] Matched pairs with two data frames

Daniel Malter daniel at umd.edu
Wed Apr 16 12:19:12 CEST 2008


Hi, sorry for jumping in here, but to me your description of why you want to
have only the needed data rows remains ambiguous.

If you just want to select the data you indicate then you do:

selected.data=data[ , needed=="yes"]

where "data" is the name of your long dataset (13 obs). As I see it, you are
just selecting data rows from the whole data set; you are not merging or
unstacking data in any way.

If that is true, it would be more helpful to know why it is lines 1, 6, and
9 that you need and not the others. That is, is there a systematic reason
for "yes" and "no" in the "needed" variable - a reason that could be coded?
Or is it just your (arbitrary) selection? This is the part of your questions
that I completely do not understand. If the answer to the aforementioned
question is yes, there is a reason, then we would need to know the criteria
on which to base the coding for this variable. What makes a yes a yes and a
no a no?

Cheers,
Daniel




-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Udo
Gesendet: Wednesday, April 16, 2008 5:59 AM
An: r-help at r-project.org
Betreff: Re: [R] Matched pairs with two data frames

Patrick,
my intention was, to perform a one-to-one exact match, which pairs each
treated unit with ONE control unit (without replacement), using my two
confounders (age, school) for matching.


Patrick Connolly schrieb:
On Mon, 14-Apr-2008 at 08:37AM +0200, Udo wrote:

|> Zitat von Peter Alspach <PAlspach at hortresearch.co.nz>:
|>
|> > Udo
|> >
|> > Seems you might want merge()
|> >
|> > HTH .......
|> >
|> > Peter Alspach
|>
|> Thank you Peter and Jorge,
|>
|> but as I had written in my last sentence, "Merge doesn´t do the job, 
|> because it makes all possible matches", but maybe there is a 
|> sophisticated solution with "merge", I could not bring light to.

>Maybe it would help if we knew what you mean by 'all' in this context.
>To get the NAs in your example, it is NECESSARY to use the all = TRUE 
>argument.  Without the all = TRUE, the NA rows are omitted.

With 'all' I mean, that in the merged data frame (13 Obs) there are 8 cases
(2*4) with age=1 and school=10 (all possible combinations).

>What is it that you don't want in this:
I only "need" line 1, 6 and 9. To show this, I added "needed" by hand.

   age school out1 out2	     needed
1    1     10  9.5  1.1      yes
2    1     10  9.5  2.0	     no
3    1     10  9.5  3.5	     no
4    1     10  9.5  4.9	     no
5    1     10  2.3  1.1	     no
6    1     10  2.3  2.0	     yes
7    1     10  2.3  3.5	     no
8    1     10  2.3  4.9	     no
9    2     20  3.3  6.5	     yes
10   2     20  4.1  6.5	     no
11   2     20  5.9  6.5	     no
12   3     33   NA  5.2	     no
13   4     11  4.6   NA	     no

>Whatever it is, can't you subset them out?
Yes, that´s the problem. To describe what I mean, I added the variable
“needed”
by hand. I don´t know how to compute such a variable to subset.


My final data frame should look like this:
    age school out1 out2	nedded
1    1     10  9.5  1.1 	yes
6    1     10  2.3  2.0	        yes
9    2     20  3.3  6.5	        yes

I hope, I could make clear, what the problem ist and waht I mean.

An alternative would be using packages like “Matching” or “MatchIt”, which
need a “long” data structure with one data frame and not a “wide” one with
two data frames.


Many thanks!
Udo

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list