[R] read.table and double quotes in strings

Adrian Dusa dusa.adrian at gmail.com
Sun Dec 16 21:09:22 CET 2007


Dear all,

Some very wise data entry person gave me about an hour of a headache, trying 
to find out why a 2000x500 dataframe won't be read into R.
After much trial and error, I pinpointed the problem to an accidentally 
inserted double quote into a string variable (some comments from an open 
question). This can be replicated by:

aa <- data.frame(id=1:2, var1=c("some \" quote", "without quote"))
> aa
  id          var1
1  1  some " quote
2  2 without quote

Saving this with R:
write.table(aa, "aa.dat", sep="\t", row.names=F)

creates the following ASCII file (between #s)

### R export
"id"	"var1"
1	"some \" quote"
2	"without quote"
###

which throws an error when trying to load it back:

> bb <- read.table("aa.dat", sep="\t", header=T)
Warning message:
In read.table("aa.dat", sep = "\t", header = T) :
  incomplete final line found by readTableHeader on 'aa.dat'

The dataframe was initially an SPSS file, which saved it as tab delimited in 
this format:

### SPSS export
"id"	"var1"
1	"some " quote"
2	"without quote"
###

which of course thrown the same obvious error.

StatTransfer was the only software that solved the problem of exporting the 
SPSS file in a tab delimited file that could finally be imported in R, and 
the saved file looks like this:

### StatTransfer export
"id"	"var1"
1	"some "" quote"
2	"without quote"
###

Given these examples, I have two questions:
1. What is the correct syntax to import the R-exported file
2. What can I do to prevent these situations from happening?
(besides whipping the data entry person :), I am referring to R procedures to 
detect and correct such things)

Thank you,
Adrian


-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101



More information about the R-help mailing list