[R] Filtering a dataset's columns by another dataset's column names

Marc Schwartz marc_schwartz at comcast.net
Fri Feb 27 18:36:55 CET 2009


on 02/27/2009 11:27 AM Josh B wrote:
> Hello all,
> 
> I hope some of you can come to my rescue, yet again.
> 
> I have two genetic datasets, and I want one of the datasets to have only the columns that are in common with the other dataset. 
> Here is a toy example (my real datasets have hundreds of columns):
> 
> Dataset 1:
> 
> Individual    SNP1    SNP2    SNP3    SNP4    SNP5
> 1    A    G    T    C    A
> 2    T    C    A    G    T
> 3    A    C    T    C    A
> 
> Dataset 2:
> 
> Individual    SNP1    SNP3    SNP5    SNP6    SNP7
> 4    A    T    T    G    C
> 5    T    A    A    G    G
> 6    A    A    T    C    G
> 
> I want Dataset1 to have only columns that are also represented in Dataset 2, i.e., I want to generate a new Dataset 3 that looks like this:
> 
> Individual    SNP1    SNP3    SNP5
> 1    A    T    A
> 2    T    A    T
> 3    A    T    A
> 
> Does anyone know how I could do this? Keep in mind that this is not a simple merge, as in the "merge" function.
> 
> Thanks very much for your help everyone.
> Josh B.

Same.Cols <- intersect(names(DF1), names(DF2))

> Same.Cols
[1] "Individual" "SNP1"       "SNP3"       "SNP5"

> rbind(DF1[, Same.Cols], DF2[, Same.Cols])
  Individual SNP1 SNP3 SNP5
1          1    A    T    A
2          2    T    A    T
3          3    A    T    A
4          4    A    T    T
5          5    T    A    A
6          6    A    A    T


See ?intersect, which gives you the common column names, which you can
then use in rbind().

HTH,

Marc Schwartz




More information about the R-help mailing list