[R] Namibia becoming NA

(Ted Harding) Ted.Harding at manchester.ac.uk
Sun Jul 18 10:25:09 CEST 2010


On 18-Jul-10 05:47:03, Suresh Singh wrote:
> I have a data file in which one of the columns is country code and NA
> is the
> code for Namibia.
> When I read the data file using read.csv, NA for Namibia is being
> treated as
> null or "NA"
> 
> How can I prevent this from happening?
> 
> I tried the following but it didn't work
> input <- read.csv("padded.csv",header = TRUE,as.is = c("code2"))
> 
> thanks,
> Suresh

I suppose this was bound to happen, and in my view it represent
a bit of a mess! With a test file temp.csv:

  Code,Country
  DE,Germany
  IT,Italy
  NA,Namibia
  FR,France

  X <- read.csv("temp.csv")
  X
  Code Country
  # 1   DE Germany
  # 2   IT   Italy
  # 3 <NA> Namibia
  # 4   FR  France
  which(is.na(X))
  # [1] 3

exactly as Suresh describes. It does not help to surround the NA
in temp.csv with quotes:

  Code,Country
  DE,Germany
  IT,Italy
  "NA",Namibia
  FR,France

leads to exactly the same result. And I have tried every variation
I can think of of "as.is" and "colClasses", still with exactly the
same result!

Conclusion: If an entry in a data file is intended to become the
character value "NA", there seems to be no way of reading it in
directly. This should not be so: it should be preventable!

As a cure, assuming that no other value in the Country Code is
actually missing (and so should be <NA>), then (with Suresh's
naming) I would suggest, subsequent to reading in the file,
something like the following. The complication is that the variable
code2 is now a factor, and you cannot simply assign a character
value "NA" to its <NA> value -- you will get an error message.
Hence:

  ix <- which(is.na(input$code2))
  Y  <- as.character(input$code2)
  Y[ix] <- "NA"
  input$code2) <- factor(Y)

The corresponding code for my test example is:

  ix <- which(is.na(X$Code))
  Y  <- as.character(X$Code)
  Y[ix] <- "NA"
  X$Code <- factor(Y)

  X
  #   Code Country
  # 1   DE Germany
  # 2   IT   Italy
  # 3   NA Namibia
  # 4   FR  France
  which(is.na(X))
  # integer(0)

So that works.

There ought to be an option in read.csv() and friends which suppresses
the conversion of a string "NA" found in input into an <NA> value.
Maybe there is -- but, if so, it is not visible in the documentation!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jul-10                                       Time: 09:25:05
------------------------------ XFMail ------------------------------



More information about the R-help mailing list