[R] Creating binary variable depending on strings of two dataframes

David Winsemius dwinsemius at comcast.net
Fri May 6 19:41:30 CEST 2011


On May 6, 2011, at 11:35 AM, Pete Pete wrote:

>
> Gabor Grothendieck wrote:
>>
>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport at gmail.com>
>> wrote:
>>>
>>> Hi,
>>> consider the following two dataframes:
>>> x1=c("232","3454","3455","342","13")
>>> x2=c("1","1","1","0","0")
>>> data1=data.frame(x1,x2)
>>>
>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
>>> data2=data.frame(y1,y2)
>>>
>>> I need a new column in dataframe data1 (x3), which is either 0 or 1
>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The
>>> result
>>> of data1 should look like this:
>>>   x1     x2 x3
>>> 1 232   1   1
>>> 2 3454 1   1
>>> 3 3455 1   0
>>> 4 342   0   0
>>> 5 13     0   1
>>>
>>> I think a SQL command could help me but I am too inexperienced  
>>> with it to
>>> get there.
>>>
>>
>> Try this:
>>
>>> library(sqldf)
>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join  
>>> data2 d2
>>> on (x1 = y1) group by x1, x2 order by d1.rowid")
>>    x1 x2 x3
>> 1  232  1  1
>> 2 3454  1  1
>> 3 3455  1  0
>> 4  342  0  0
>> 5   13  0  1
>>
>>
snipped Gabor's sig
>
> That works pretty cool but I need to automate this a bit more.  
> Consider the
> following example:
>
> list1=c("A01","B04","A64","G84","F19")
>
> x1=c("232","3454","3455","342","13")
> x2=c("1","1","1","0","0")
> data1=data.frame(x1,x2)
>
> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
> data2=data.frame(y1,y2)
>
> I want now to creat a loop, which creates for every value in list1 a  
> new
> binary variable in data1. Result should look like:
> x1	x2	A01	B04	A64	G84	F19
> 232	1	0	1	0	0	0
> 3454	1	0	0	1	0	1
> 3455	1	0	0	0	0	0
> 342	0	0	0	0	0	0
> 13	0	1	0	0	1	1

Loops!?! We don't nee no steenking loops!

 > xtb <-  with(data2, table(y1,y2))
 > cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )
        x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
232   232  1   0   0   1   1   0   0   0   0   0
3454 3454  1   0   1   0   0   0   1   0   0   0
3455 3455  1   0   0   0   0   1   0   0   0   0
342   342  0   0   0   0   0   0   0   0   0   1
13     13  0   1   0   0   0   0   1   1   1   0

I am guessing that you were to ... er, busy? ... to complete the table?

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list