[R] Subsetting dataframes based on column names

David Winsemius dwinsemius at comcast.net
Wed Sep 23 00:35:43 CEST 2009


On Sep 22, 2009, at 5:58 PM, Corey Sparks wrote:

> Dear R users,
> I am interested in taking the columns from multiple dataframes, the  
> problem is that the different dataframes have different combinations  
> of the same variable names, here's a simple example:
> a<-rep(1:10)
> b<-rep(1:10)
> c<-rep(21:30)
> d<-rep(31:40)
>
> dat.a<-data.frame(a,b,c,d)
> names(dat.a)<-c("a", "b", "c", "d")
>
> dat.b<-data.frame(a,c,d)
> names(dat.b)<-c("a", "c", "d")
>
> I would like to first see if the names in the larger dataframe match  
> those of the smaller (they have the same variables)
>
> names(dat.a)%in%names(dat.b)
>
>
> Could anyone help with this problem, I would basically like to form  
> a subset of the dat.a that matches the variable names in dat.b.  If  
> there were only a few variables, this would be easier, but I have  
> between 4 and 5 thousand variables in each dataset

I have never tried this on the scale you propose, but on your toy  
example, here's what works;

 > names(dat.a)%in%names(dat.b)  # your code which returns a logical  
vector
[1]  TRUE FALSE  TRUE  TRUE

 > subset(dat.a, select= names(dat.a)%in%names(dat.b) )
     a  c  d
1   1 21 31
2   2 22 32
3   3 23 33
4   4 24 34
5   5 25 35
6   6 26 36
7   7 27 37
8   8 28 38
9   9 29 39
10 10 30 40

>

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list