# [R] combining data from different datasets

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Fri Oct 24 20:05:58 CEST 2008

```2008/10/24 Gabor Grothendieck <ggrothendieck at gmail.com>:

> NA and "NA" are not the same:
>
>> DF <- data.frame(x = c("a", "NA", NA))
>> DF
>     x
> 1    a
> 2   NA
> 3 <NA>
>>
>> is.na(NA)
> [1] TRUE
>> is.na("NA")
> [1] FALSE

Yes, but unless you tell it otherwise, read.table will think Namibia
is an NA, even in a column of alphabetic strings:

1,US
2,NA
3,UK

V1   V2
1  1   US
2  2 <NA>
3  3   UK

So you think you can use na.strings? Calling with na.strings seems to
work on both columns, and hence converts columns with real NAs into
Factors. Here's some data:

\$ cat test.dat
1,US
2,NA
3,UK
NA,FR
4,PT

We need column 1 to be integer with an NA, and column 2 to be text
with a real "NA" and not a <NA>:

Try #1 (NAive effort) reads NA(mibia) as NA(missing), keeps V1 as integers:

V1   V2
1  1   US
2  2 <NA>
3  3   UK
4 NA   FR
5  4   PT

= FAIL

V1 V2
1  1 US
2  2 NA
3  3 UK
4 NA FR
5  4 PT

'data.frame':	5 obs. of  2 variables:
\$ V1: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 5 4
\$ V2: Factor w/ 5 levels "FR","NA","PT",..: 5 2 4 1 3

= FAIL

#3 lets try colClasses:

V1   V2
1  1   US
2  2 <NA>
3  3   UK
4 NA   FR
5  4   PT

= FAIL

#4 So... lets try to specify colClasses and na.strings:

V1 V2
1  1 US
2  2 NA
3  3 UK
4 NA FR
5  4 PT

- looks good:

'data.frame':	5 obs. of  2 variables:
\$ V1: num  1 2 3 NA 4
\$ V2: chr  "US" "NA" "UK" "FR" ...

= WIN!

I'm not certain how that works. I guess the conversion of column 1 to
numeric causes the NA rather than the matching of it to the na.strings
parameter....

Barry

```