[R] inner join sqldf

Newbie19_02 nvanzuydam at gmail.com
Mon Mar 15 13:19:17 CET 2010


Hi,

I have two dataframes that have some common columns.   I would like to join
them by the common columns prochi and prescribed_date as there are duplicate
prochis but they will be made unique by date.  I tried doing an inner join
but that just duplicated the columns whereas I would like the information
from the test_sql_tsf to fill the NAs in the test_sql_psd common columns.  

require(sqldf)

test_sql_psd <-  http://n4.nabble.com/file/n1593282/test_sql_psd.txt
test_sql_psd.txt , header=TRUE, sep="\t", dec=".", na.strings="NA",
check.names=TRUE, quote= "\"'")


test_sql_tsf <-read.table(file=
http://n4.nabble.com/file/n1593282/test_sql_tsf.txt test_sql_tsf.txt , ,
header=TRUE, sep="\t", dec=".", na.strings="NA", check.names=TRUE, quote=
"\"'")


test_sql_innerjoin <- sqldf("select * test_sql_psd inner join test_sql_tsf
on test_sql_psd.prochi=test_sql_tsf.prochi")

colnames(test_sql_psd)
 [1] "prochi"           "prescribed_date"  "dataMonth"        "item_code"       
 [5] "res_seqno"        "quantity"         "directions"       "no_of_packs"     
 [9] "datasource"       "scan_ref_no"      "name"            
"approved_name"   
[13] "formulation_code" "strength"         "measure_code"     "bnf_code"        
[17] "bnf_description" 

 colnames(test_sql_tsf)
 [1] "prochi"           "prescribed_date"  "dataMonth"        "item_code"       
 [5] "res_seqno"        "quantity"         "directions"       "no_of_packs"     
 [9] "datasource"       "scan_ref_no"      "name"            
"formulation_code"
[13] "strength"         "bnf_code"   


with the result of:

colnames(test_sql_innerjoin)
 [1] "prochi"           "prescribed_date"  "dataMonth"        "item_code"       
 [5] "res_seqno"        "quantity"         "directions"       "no_of_packs"     
 [9] "datasource"       "scan_ref_no"      "name"            
"formulation_code"
[13] "strength"         "bnf_code"         "prochi"          
"prescribed_date" 
[17] "dataMonth"        "item_code"        "res_seqno"        "quantity"        
[21] "directions"       "no_of_packs"      "datasource"       "scan_ref_no"     
[25] "name"             "approved_name"    "formulation_code" "strength"        
[29] "measure_code"     "bnf_code"         "bnf_description" 


I'm not sure if I am using the correct sqldf command or if there is an
easier way to do this from the start.  I also tried 
test_sql_union<-sqldf("select * test_sql_tsf union select * test_sql_psd")
which gave me the same result as inner join.  I'm not sure if I am using the
correct commands for what I want to do?



Thanks for your help. 
Natalie

-- 
View this message in context: http://n4.nabble.com/inner-join-sqldf-tp1593282p1593282.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list