[R] sqldf not joining all the fields

David Winsemius dwinsemius at comcast.net
Fri Mar 12 18:42:39 CET 2010


If I assign the file input to y_data and change you sqldf to

 > z  <- sqldf("select * from x_data left join y_data using  
(item_code)"); z

I can replicate your result. Even after changing the types of the two  
item_code fields to match I still get the same result and when I see  
to what degree they share values I get:

 > sum(x_data$item_code %in% y_data$item_code)
[1] 2
 > sum(y_data$item_code %in% x_data$item_code)
[1] 2


So why are you so sure they are "complete" as you claimed in your  
first email.

-- 
David.


On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:

> You have now given two different assignments to x_data and none to  
> y_data:
>
> The str( from the file access offering:
>
> > str(x_data)
> 'data.frame':	2848 obs. of  5 variables:
> $ item_code       : int  100 110 150 160 161 164 200 210 212 220 ...
> $ name            : chr  "NEONACLEX K" "NEONACLEX" "MESORB"  
> "ABSORBENT CELLULOSE MESO" ...
> $ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
> $ strength        : chr  "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
> $ bnf_code        : chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
>
> The str from assignment from the dput offering
> > str(x_data)
> 'data.frame':	10 obs. of  10 variables:
> $ prochi         : chr  "CAO0000713" "CAO0000713" "CAO0000713"  
> "CAO0000713" ...
> $ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000"  
> "03/07/2000" ...
> $ dataMonth      : chr  "NULL" "NULL" "NULL" "NULL" ...
> $ item_code      : chr  "842752" "7800" "842652" "842652" ...
> $ res_seqno      : chr  "NULL" "NULL" "NULL" "NULL" ...
> $ quantity       : chr  "60" "100G" "60" "60" ...
> $ directions     : chr  "1/D" "A/TD" "1/D" "1/D" ...
> $ no_of_packs    : chr  "NULL" "NULL" "NULL" "NULL" ...
> $ datasource     : chr  "TSF" "TSF" "TSF" "TSF" ...
> $ scan_ref_no    : chr  "NULL" "NULL" "NULL" "NULL" ...
>
> This code "worked", but it is not clear that the x-y assignments  
> were correct:
>
> x_data <- read.table(file="http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
> ", header = TRUE, sep = "|", quote = "\"'",
>               dec = ".",as.is = TRUE,na.strings = "NA",colClasses =  
> NA,
> nrows = 3864284,
>               skip = 0, check.names = TRUE,fill=TRUE,
>               strip.white = TRUE, blank.lines.skip = TRUE,
>               comment.char = "#", allowEscapes = FALSE, flush = FALSE,
>               fileEncoding = "", encoding = "unknown")
>
> -- 
> David.
>
> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
>
>>
>> The y_data file has over 9000 rows in it so I thought it would be  
>> more
>> practical to give you the file to download....
>> -- 
>> View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list