[R] expected behavior when parsing lines with special characters

David Wolfskill david at catwhisker.org
Tue Feb 15 18:26:16 CET 2011


On Tue, Feb 15, 2011 at 12:21:18PM -0500, Robert M. Flight wrote:
> Say I have a tab-delimited table I want to read into R. What should I
> expect to happen if some of the entries contain the character " ' "? I
> thought it would read the file fine, but that is not what happens.
> Instead, all the values in between two " ' "s get read into one field,
> and things are just seriously messed up. Is this a bug, and besides
> removing the offending characters, is there a fix?
> 
> Example Input file:
> 
> testFile.txt:
> 3499	9031	424823	COP'B2	118094989	XP_422637.2
> 3499	7955	114454	copb2	50080158	NP_001001940.1
> 3499	7227	45757	betaCop	24584107	NP_524836.2
> ...
> 
> testDat <- read.table('testFile.txt',sep='\t')
> testDat

I believe you want to use:

testDat <- read.table('testFile.txt',sep='\t',quote="")

Ref.:

   quote: the set of quoting characters. To disable quoting altogether,
          use 'quote = ""'.  See 'scan' for the behaviour on quotes
          embedded in quotes.  Quoting is only considered for columns
          read as character, which is all of them unless 'colClasses'
          is specified.

>...

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110215/3e644e00/attachment.bin>


More information about the R-help mailing list