[R] sqldf not joining all the fields

Newbie19_02 nvanzuydam at gmail.com
Fri Mar 12 17:56:47 CET 2010


Dear R users,

I have two data frames that were read from text files as follows:

x_data <- read.table("x.txt", header = TRUE, sep = "|", quote = "\"'",
                dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
                skip = 0, check.names = TRUE,fill=TRUE,
                strip.white = TRUE, blank.lines.skip = TRUE,
                comment.char = "#", allowEscapes = FALSE, flush = FALSE,
                fileEncoding = "", encoding = "unknown")

x_data

prochi prescribed_date dataMonth item_code res_seqno quantity directions
CAO0000713      22/06/2001      NULL    842752      NULL       60        1/D
CAO0000713      28/04/2000      NULL      7800      NULL     100G       A/TD
CAO0000713      10/04/2000      NULL    842652      NULL       60        1/D
CAO0000713      03/07/2000      NULL    842652      NULL       60        1/D
CAO0000713      09/01/2001      NULL    842752      NULL       60        1/D
CAO0000713      16/10/2001      NULL    842752      NULL       60        1/D
CAO0000713      16/08/2001      NULL    842752      NULL       60        1/D
CAO0000713      17/09/1993      NULL     39620      NULL      5ML        NIL
CAO0000713      01/05/2001      NULL    842752      NULL       60        1/D
CAO0000713      05/03/2001      NULL    842752      NULL       60        1/D



y_data

 item_code    name                              formulation_code  strength
bnf_code
100              NEONACLEX K                    TABS      NULL    2.2.8
110                NEONACLEX                     TABS       5MG    2.2.1
50                   MESORB                         DRESS 10CMX10CM   20.3.1
160 ABSORBENT CELLULOSE MESO            DRESS 10CMX10CM   20.3.1
161 ABSORBENT CELLULOSE MESO            DRESS 10CMX15CM   20.3.1
164 ABSORBENT CELLULOSE MESO            DRESS 20CMX25CM   20.3.1
200                  SEPTRIN                        TABS     480MG    5.1.8
210          SEPTRIN PAED SF                    SUSP 240MG/5ML    5.1.8
212            SEPTRIN ADULT                     SUSP 480MG/5ML    5.1.8
220            SEPTRIN FORTE                     TABS     960MG    5.1.8
 etc....


contains all the information for the item codes
y was read in in the same way.

I then used the following code:

z  <- sqldf("select * from x left join y using (code)")

when I use this on my real data I get an output:
 prochi prescribed_date dataMonth item_code res_seqno quantity directions
1  CAO0000713      22/06/2001      NULL    842752      NULL       60       
1/D
2  CAO0000713      28/04/2000      NULL      7800      NULL     100G      
A/TD
3  CAO0000713      10/04/2000      NULL    842652      NULL       60       
1/D
4  CAO0000713      03/07/2000      NULL    842652      NULL       60       
1/D
5  CAO0000713      09/01/2001      NULL    842752      NULL       60       
1/D
6  CAO0000713      16/10/2001      NULL    842752      NULL       60       
1/D
7  CAO0000713      16/08/2001      NULL    842752      NULL       60       
1/D
8  CAO0000713      17/09/1993      NULL     39620      NULL      5ML       
NIL
9  CAO0000713      01/05/2001      NULL    842752      NULL       60       
1/D
10 CAO0000713      05/03/2001      NULL    842752      NULL       60       
1/D
   no_of_packs datasource scan_ref_no         name formulation_code strength
1         NULL        TSF        NULL         <NA>             <NA>     <NA>
2         NULL        TSF        NULL BETNOVATE RD             OINT   0.025%
3         NULL        TSF        NULL         <NA>             <NA>     <NA>
4         NULL        TSF        NULL         <NA>             <NA>     <NA>
5         NULL        TSF        NULL         <NA>             <NA>     <NA>
6         NULL        TSF        NULL         <NA>             <NA>     <NA>
7         NULL        TSF        NULL         <NA>             <NA>     <NA>
8         NULL        TSF        NULL   GAMMABULIN              INJ    320MG
9         NULL        TSF        NULL         <NA>             <NA>     <NA>
10        NULL        TSF        NULL         <NA>             <NA>     <NA>
   bnf_code
1      <NA>
2  13.4.1.2
3      <NA>
4      <NA>
5      <NA>
6      <NA>
7      <NA>
8      14.5
9      <NA>
10     <NA>


There is absolutely no reason for there to be <NA> anywhere as the
information for both the tables is complete.

Not sure what the problem is?

Thanks,
Natalie
-- 
View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list