[R] A slight trap in read.table/read.csv.

Don MacQueen macq at llnl.gov
Mon Mar 1 00:28:06 CET 2010


There is, however, an important distinction.

Quoting from ?TRUE  (or ?logical):

'TRUE' and 'FALSE' are reserved words denoting logical constants
      in the R language, whereas 'T' and 'F' are global variables whose
      initial values set to these.  All four are 'logical(1)' vectors.

>  TRUE <- 3
Error in TRUE <- 3 : invalid (do_set) left-hand side to assignment

In other words, the rule is
   T is TRUE unless otherwise defined by the user
(ditto for F)

So this rule apparently applies to input from a file. Using 
colClasses is then an example of "otherwise defined by the user."

I think it's logical (pun not particularly intended) and consistent 
(though perhaps not ideal, but that's another question...)

-Don


At 5:37 PM -0500 2/28/10, Gabor Grothendieck wrote:
>It is strange.  Even in R itself T and F are not guaranteed to be TRUE
>and FALSE.
>
>>  T <- 1:3
>>  T
>[1] 1 2 3
>
>
>On Sun, Feb 28, 2010 at 4:55 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>
>>  I had occasion recently to read in a one-line *.csv file that
>>  looked like:
>>
>>  "CandidateName","NSN","Ethnicity","dob","gender"
>>  "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>>
>>  That "F" (for female) in the last field got transformed to
>>  FALSE.  Apparently read.csv (and hence read.table) are inferring
>>  that if the entries of a file are all F's and T's then the
>>  field is interpreted as logical.
>>
>>  If I change the file to
>>
>>  "CandidateName","NSN","Ethnicity","dob","gender"
>>  "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>>  "Mingdinkler, Melvin Queue",999888777,"01/04/1942","M"
>>
>>  then the read functions correctly interpret the last field
>>  as being character.
>>
>>  The translation of "F" into FALSE resulted in some mysterious
>>  contretemps in further analysis, which it took me a while to
>>  track down.
>>
>>  I solved the problem by putting in a colClasses argument in my
>>  call to read.csv().  But I really think that the read functions
>>  are being too clever by half here.  If field entries are surrounded
>>  by quotes, shouldn't they be left as character?  Even if they are
>>  all F's and T's?
>>
>>  Furthermore using F's and T's to represent TRUE's and FALSE's is
>>  bad practice anyway.  Since FALSE and TRUE are reserved words it
>>  would make sense for the read function to assume that a field is
>>  logical if it consists entirely of these words.  But T's and F's
>>  .... I don't think so.
>>
>>  I would argue that this behaviour should be changed.  I can see no
>>  downside to such a change.
>>
>>         cheers,
>>
>>                 Rolf Turner
>>
>>  ######################################################################
>>  Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>>
>>  ______________________________________________
>>  R-help at r-project.org mailing list
>>  https://*stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guide 
>>http://*www.*R-project.org/posting-guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov



More information about the R-help mailing list