[Rd] should `data` respect default.stringsAsFactors()?

peter dalgaard pdalgd at gmail.com
Fri Feb 19 10:02:11 CET 2016


Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table. 

Yes, the doc should probably be fixed. The code probably not -- packages loading different data sets depending on user options is an even worse idea than havíng the option in the first place... (I don't mean having the possibility, I mean the default.stringsAsFactor thing). 

In general, read.table() gets many things wrong, if you don't set switches and/or postprocess. E.g., even when you do intend to read factors, the alphabetical level order is often not desired. My favourite workaround for data() is to drop a corresponding foo.R file in the ./data directory. This will be run in preference to loading foo.txt (or foo.csv, etc) and can contain, like, 

dd <- read.table(foo.txt,.....) 
dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))

etc.

-pd



> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
> 
> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Hi Peter,
>> 
>> Sorry if I was not clear.  Perhaps an example will make my point:
>> 
>>> data(iris)
>>> class(iris$Species)
>> [1] "factor"
>>> write.table(iris,'data/myiris.tab')
>>> data(myiris)
>>> class(myiris$Species)
>> [1] "factor"
>>> rm(myiris)
>>> options(stringsAsFactors = FALSE)
>>> data(myiris)
>>> class(myiris$Species)
>> [1] "factor"
>>> myiris<-read.table("data/myiris.tab",header=TRUE)
>>> class(myiris$Species)
>> [1] "character"
>> 
>> I am surprised to find that in the above
>>          setting the global option stringsAsFactors = FALSE does NOT effect how Species is being read in by the `data` function
>> whereas
>>        setting the global option stringsAsFactors = FALSE DOES effect how Species is being read in by read.table
>> 
>> especially since data is documented as calling read.table.
>> 
> To be explicit, it's documented as calling read.table(..., header =
> TRUE) in this case, but it actually calls read.table(..., header =
> TRUE, as.is = FALSE), which results in class(myiris$Species) of
> "factor".
> 
> R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
> R> class(myiris$Species)
> [1] "factor"
> 
> So it seems like adding as.is = FALSE to the call in the documentation
> would clear this up.
> 
>> In my opinion, one or the other should change (the behavior of data, or the documentation).
>> 
>> <bleep> <bleep>,
>> 
>> ~ Malcolm
>> 
>> 
>>> -----Original Message-----
>>> From: peter dalgaard [mailto:pdalgd at gmail.com]
>>> Sent: Thursday, February 18, 2016 3:32 PM
>>> To: Cook, Malcolm <MEC at stowers.org>
>>> Cc: r-devel at stat.math.ethz.ch
>>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
>>> 
>>> What the <bleep> are you on about? data() does many things, only some of
>>> which call read.table() et al., and the ones that do have no special treatment
>>> of stringsAsFactors.
>>> 
>>> -pd
>>> 
>>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
>>>> 
>>>> Hiya,
>>>> 
>>>> Probably been debated elsewhere....
>>>> 
>>>> I note that R's `data` function does not respect default.stringsAsFactors
>>>> 
>>>> By my lights, it should, especially as it is documented to call read.table,
>>> which DOES respect.
>>>> 
>>>> Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
>>> tp921891p921893.html
>>>> 
>>>> Compelling.  I have to agree.
>>>> 
>>>> So, I change my mind.
>>>> 
>>>> By my lights, `data` should then be documented to NOT respect
>>> default.stringsAsFactors.
>>>> 
>>>> Else?
>>>> 
>>>> ~Malcolm Cook
>>>> 
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> -- 
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
> R/Finance 2016 | www.rinfinance.com

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list