[R] read columns of quoted numbers as factors

peter dalgaard pdalgd at gmail.com
Tue Oct 5 13:25:52 CEST 2010


On Oct 4, 2010, at 18:39 , james hirschorn wrote:

> Suppose I have a data file (possibly with a huge number of columns), where the 
> columns with factors are coded as "1", "2", "3", etc ... The default behavior of 
> read.table is to convert these columns to integer vectors. 
> 
> Is there a way to get read.table to recognize that columns of quoted numbers 
> represent factors (while unquoted numbers are interpreted as integers), without 
> explicitly setting them with colClasses ?

I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost.

One possibility, somewhat dependent on the exact file format, would be to temporarily set quote="", see which columns contains quote characters, and, on a second pass, read those columns as factors, using  a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though.


> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list