Combining Overlapping Data
Dennis Murphy
Sat Nov 12 02:29:56 CET 2011
Hi:
This doesn't sort the data by strain level, but I think it does what
you're after. It helps if strain is either a factor or character
vector in each data frame.
h <- function(x, y) {
tbx <- table(x$strain)
tby <- table(y$strain)
# Select the strains who have more than one member
# in each data frame
mgrps <- intersect(names(tbx[tbx > 0]),
names(tby[tby > 0]))
# concatenate the data with common strains
rbind(subset(x, gp %in% mgrps),
subset(y, gp %in% mgrps))
}
# Result:
dc <- h(x, y)
On Fri, Nov 11, 2011 at 1:07 PM, kickout <kyle.kocak at gmail.com> wrote:
> I've scoured the archives but have found no concrete answer to my question.
>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a variable
> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i dont want
> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>
>
