[R] A slight trap in read.table/read.csv.

David Winsemius dwinsemius at comcast.net
Sun Feb 28 23:33:41 CET 2010


On Feb 28, 2010, at 4:55 PM, Rolf Turner wrote:

>
> I had occasion recently to read in a one-line *.csv file that
> looked like:
>
> "CandidateName","NSN","Ethnicity","dob","gender"
> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>
> That "F" (for female) in the last field got transformed to
> FALSE.  Apparently read.csv (and hence read.table) are inferring
> that if the entries of a file are all F's and T's then the
> field is interpreted as logical.
>
> If I change the file to
>
> "CandidateName","NSN","Ethnicity","dob","gender"
> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
> "Mingdinkler, Melvin Queue",999888777,"01/04/1942","M"
>
> then the read functions correctly interpret the last field
> as being character.
>
> The translation of "F" into FALSE resulted in some mysterious
> contretemps in further analysis, which it took me a while to
> track down.
>
> I solved the problem by putting in a colClasses argument in my
> call to read.csv().  But I really think that the read functions
> are being too clever by half here.  If field entries are surrounded
> by quotes, shouldn't they be left as character?  Even if they are
> all F's and T's?
>
> Furthermore using F's and T's to represent TRUE's and FALSE's is
> bad practice anyway.  Since FALSE and TRUE are reserved words it
> would make sense for the read function to assume that a field is
> logical if it consists entirely of these words.  But T's and F's
> .... I don't think so.

It is documented that conversion will be attempted to logical, so it  
does make sense that T/F would become TRUE and FALSE since that is  
typical behavior elsewhere. But at the very least this sentence in the  
type.convert help page:
"Given a character vector, it attempts to convert it to logical,  
integer, numeric or complex, and failing that converts it to factor  
unless as.is = TRUE."
  ... ought to be clarified. It is not at all clear that the  
conversion to logical still will be attempted even if as.is=TRUE, i.e.  
the only conversion not attempted would be to factor.

>
> I would argue that this behaviour should be changed.  I can see no
> downside to such a change.
>
> 	cheers,
>
> 		Rolf Turner
>
> ######################################################################
> Attention:\ This e-mail message is privileged and confid...{{dropped: 
> 9}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list