[R] matching each row

Marc Schwartz marc_schwartz at me.com
Wed Jul 8 20:10:32 CEST 2009


On Jul 8, 2009, at 10:09 AM, tathta wrote:

>
> I have two dataframes, the first column of each dataframe is a  
> unique id
> number (the rest of the columns are data variables).
> I would like to figure out how many times each id number appears in  
> each
> dataframe.
>
> So far I can use:
> length( match (dataframeA$unique.id[1], dataframeB$unique.id) )
>
> but this only works on each row of dataframe A one-at-a-time.
>
> I would like to do this for all of the rows in dataframe A, and then  
> put the
> results in a new variable: dataframeA$count
>
>
> I'm new to R, so please be patient with me!
>
>
> Sorry if this question has already been answered, my search of the  
> archives
> only brought up one relevant post, and I didn't understand the  
> answer to
> it....  http://www.nabble.com/match-to20799206.html#a20799206


If I am correctly understanding what you are looking for, you could do  
something like the following:

# Create some simple data. Note that only a subset of the ID's (3:5)  
will match across the two DF's:
set.seed(1)
DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE))
DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE))

 > DF.A
    ID
1   2
2   2
3   3
4   5
5   2
6   5
7   5
8   4
9   4
10  1

 > DF.B
    ID
1   4
2   3
3   6
4   4
5   6
6   5
7   6
8   7
9   4
10  6


Now, create counts of the IDs in each, coercing the results to data  
frames and setting the count column name for each:

TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A")
TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B")

 > TAB.A
   Var1 Count.A
1    1       1
2    2       3
3    3       1
4    4       2
5    5       3

 > TAB.B
   Var1 Count.B
1    3       1
2    4       3
3    5       1
4    6       4
5    7       1


Now, use merge() to join each of the two above. 'all = TRUE' will  
include non-matching keys:

 > merge(TAB.A, TAB.B, by = "Var1", all = TRUE)
   Var1 Count.A Count.B
1    1       1      NA
2    2       3      NA
3    3       1       1
4    4       2       3
5    5       3       1
6    6      NA       4
7    7      NA       1


Note that you will get NAs for any non-matching ID's (Var1).

See ?table, ?as.data.frame and ?merge for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list