[R] read columns of quoted numbers as factors

james hirschorn j_hirschorn at yahoo.com
Wed Oct 6 02:41:18 CEST 2010


Yes, your solution of setting quote="" would read the multi-word strings 
incorrectly. A more complicated version of your solution should work: First 
check which columns are identified as strings, and then apply your solution to 
the remaining columns.

I'm a newbie at R, but it seems to me that there is a "logical inconsistency" in 
R: write.table puts quotes around numbers when they form a column of factors, 
but does not put quotes for a column of integers. Since read.table is the "dual" 
of write.table it seems that it should treat quoted and unquoted columns 
differently, analogously to write.table. However, there does not even seem to be 
an option to make read.table behave analogously.


----- Original Message ----
From: peter dalgaard <pdalgd at gmail.com>
To: james hirschorn <j_hirschorn at yahoo.com>
Cc: r-help at r-project.org
Sent: Tue, October 5, 2010 7:25:52 AM
Subject: Re: [R] read columns of quoted numbers as factors


On Oct 4, 2010, at 18:39 , james hirschorn wrote:

> Suppose I have a data file (possibly with a huge number of columns), where the 

> columns with factors are coded as "1", "2", "3", etc ... The default behavior 
>of 
>
> read.table is to convert these columns to integer vectors. 
> 
> Is there a way to get read.table to recognize that columns of quoted numbers 
> represent factors (while unquoted numbers are interpreted as integers), without 
>
> explicitly setting them with colClasses ?

I don't think there's a simple way, because the modus operandi of read.table is 
to read everything as character and then see whether it can be converted to 
numeric, and at that point any quotes will have been lost.

One possibility, somewhat dependent on the exact file format, would be to 
temporarily set quote="", see which columns contains quote characters, and, on a 
second pass, read those columns as factors, using  a computed colClasses 
argument. It will break down if you have space-separated columns with quoted 
multi-word strings, though.


> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list