[R] Combining Overlapping Data

Dennis Murphy djmuser at gmail.com
Sat Nov 12 02:29:56 CET 2011


This doesn't sort the data by strain level, but I think it does what
you're after. It helps if strain is either a factor or character
vector in each data frame.

h <- function(x, y) {
       tbx <- table(x$strain)
       tby <- table(y$strain)
  # Select the strains who have more than one member
  # in each data frame
       mgrps <- intersect(names(tbx[tbx > 0]),
                          names(tby[tby > 0]))
  # concatenate the data with common strains
       rbind(subset(x, gp %in% mgrps),
             subset(y, gp %in% mgrps))

# Result:
dc <- h(x, y)


On Fri, Nov 11, 2011 at 1:07 PM, kickout <kyle.kocak at gmail.com> wrote:
> I've scoured the archives but have found no concrete answer to my question.
> Problem: Two data sets
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
> Both have the same column names, the column of interest to me is a variable
> called strain.
> For example, a strain named "Chab1405" appears in x 150 times and in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i dont want
> that retained).
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
> I've tried x %in% y, but that only gives TRUE/FALSE
> --
> View this message in context: http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list