[R] create list of names where two df contain == values

David Winsemius dwinsemius at comcast.net
Wed Nov 16 15:04:22 CET 2011


On Nov 16, 2011, at 8:03 AM, Rob Griffin wrote:

> Hello again... sorry to be posting yet again, but I hadn't  
> anticipated this problem.
>
> I am trying to now put the names found in one column in data frame 1  
> (lets call it df.1[,1]) in to a list from the rows where the values  
> in df.1[,2] match values in a column of another dataframe (df.2[3])
> I tried to write this function so that it put the list of names  
> (called Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched  
> but I think its too complex for a beginner R-enthusiast
>
> ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else  
> {NULL}

When you are building a helper function for use with apply, your  
should realize that tat function will be getting a vector, not a list.  
The construction "[[,a]]" looks pretty strange as well. Generally  
column selection is done with one of "[[a]]" or "[ , a]". I am not  
absolutely sure that you cannot have "[[,]]" but I was under the  
impression you could not. AND you shouldn't be retruning NULLs if what  
yoyr really want are NA's.


> Iffy<-apply(  df.1,  1,  FUN=ify,  x=df.1,  y=df.2,  a=2,  b=3,   
> c=1  )

So a single vector will be assigned to the x argument in the ify  
function and the rest of the arguments will be populated from the  
other arguments. You do NOT need to supply an "x" argument in that  
list and if you do so you will throw an error.

Furthermore you cannot expect the apply function to keep track of  
which row it's one for indexing a different data.frame. The mapply  
function might be used for this purpose but I am going to suggest a  
much cleaner solution below.


>
> But this didn't work... Error in FUN(newX[, i], ...) : unused  
> argument(s) (newX[, i])
>
>
> Here is a dataset that replicates the problem, you'll notice the "h"  
> criteria values are different between the two dataframes and  
> therefore it would produce a list  of the 9 letters where the two  
> criteria columns matched (a,b,c,d,e,f,g,i,j):

If you know that df.1 and df.2 have the same number of rows then use  
the ifelse function which is designed to work on vectors. The if)_else  
construct is NOT:

 > ifelse( df.1[,2] ==df.2[,3], {as.character(df.1[,1])} ,  {NA} )
  [1] "a" "b" "c" "d" "e" "f" "g" NA  "i" "j"

The reason as.character was needed lies in that fact that you  
constructed df.1[,1] as a factor variable. AS I understand it, the  
ifelse tries to make it numeric to match the datatype of the  
comaprison. I've never understood this frankly. Maybe someoen can  
educate me.

If you wanted a function that allowed you to specify the columns and  
dataframes then consider this

ret3.m1.eq.n2 <- function(df1, df2, col1, col2, col3){
                 ifelse( df1[,col1] ==df2[,col2],  
{as.character(df1[,col3])} ,  {NA} )


>
>
>
> df.1<-data.frame(rep(letters[1:10]))
> colnames(df.1)[1]<-("Letters")
> set.seed(1)
> df.1$numb1<-rnorm(10,1,1)
> df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
> df.1$id<- 
> c 
> ("CG234 
> ","CG232 
> ","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
> df.1
>
> df.2<-data.frame(rep(letters[1:10]))
> colnames(df.2)[1]<-("Letters")
> set.seed(1)
> df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10)
> df.2$numb1<-rnorm(10,1,1)
> df.2$id<- 
> c 
> ("CG234 
> ","CG232 
> ","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154")
> df.2[8,3]<-12
>
> df.1
> df.2
>
>
>
>
> Your patience is much appreciated,
> Rob
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list