[R] Compare two data sets

David Winsemius dwinsemius at comcast.net
Wed Mar 26 02:50:59 CET 2008


<amarkey at uiuc.edu> wrote in
news:20080325101909.BDK93111 at expms2.cites.uiuc.edu: 

> I would like to compare two data sets saved as text files (example
> below) to determine if both sets are identical(or if dat2 is missing
> information that is included in dat1) and if they are not identical
> list what information is different between the two sets(ie output
> "a1", "a3" as the differing information).  The overall purpose would
> be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the
> same.  My R abilities are somewhat limited so any suggestions are
> greatly appreciated. 

I do not understand what it would mean to remove elements so "they 
would look the same". Why wouldn't you just use the smaller set?
> 
> Alysta
> 
> dat1
> a1
> a2
> a3
> a4
> a5
> a6
> 
> dat2
> a2
> a4
> a5
> a6

You might want to look at the %in% function. These examples created 
with neither dat1 nor dat2 being proper subsets of the other.

dat1 <- paste('a', 1:6, sep='')
dat2 <- paste('a', c(2,4:6,8,9,10), sep='')
> dat1
[1] "a1" "a2" "a3" "a4" "a5" "a6"
> dat2
[1] "a2"  "a4"  "a5"  "a6"  "a8"  "a9"  "a10"


dat2 %in% dat1
#[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

dat1 %in% dat2
#[1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE

### And then use the logical vectors as index arguments
### to  first get the common elements
> dat1[dat1 %in% dat2]
[1] "a2" "a4" "a5" "a6"

> dat2[dat2 %in% dat1]
[1] "a2" "a4" "a5" "a6"

### And then to find the non-shared elements
> dat2[!(dat2 %in% dat1)]
[1] "a8"  "a9"  "a10"
> dat1[!(dat1 %in% dat2)]
[1] "a1" "a3"

-- 
David Winsemius



More information about the R-help mailing list