[Rd] should `data` respect default.stringsAsFactors()?

peter dalgaard pdalgd at gmail.com
Fri Feb 19 16:23:19 CET 2016


On 19 Feb 2016, at 16:02 , Cook, Malcolm <MEC at stowers.org> wrote:

> Hi,
> 
>> Aha... Hadn't noticed that stringsAsFactors only works via as.is in read.table.
>> 
>> Yes, the doc should probably be fixed. The code probably not 
> 
> Agreed.  
> 
> Is someone on-list authorized and willing to make the documentation change?  I suppose I could learn what it takes to be a "player", but for such a trivial fix, it probably is overkill.  Dissenting opinions?

I have fixed it for r-devel.

-pd

> 
>> -- packages
>> loading different data sets depending on user options is an even worse idea
>> than havíng the option in the first place... (I don't mean having the possibility, I
>> mean the default.stringsAsFactor thing).
>> 
>> In general, read.table() gets many things wrong
> 
> I agree with you that "read.table() gets many things wrong" and I too have my favorite workarounds - but that was not my concern.  My concern is that data() does not work as documented.
> 
> ~Malcolm
> 
>> , if you don't set switches
>> and/or postprocess. E.g., even when you do intend to read factors, the
>> alphabetical level order is often not desired. My favourite workaround for
>> data() is to drop a corresponding foo.R file in the ./data directory. This will be
>> run in preference to loading foo.txt (or foo.csv, etc) and can contain, like,
>> 
>> dd <- read.table(foo.txt,.....)
>> dd$cook <- factor(dd$cook, levels=c("rare","medium","well-done"))
>> 
>> etc.
>> 
>> -pd
>> 
>> 
>> 
>>> On 19 Feb 2016, at 01:39 , Joshua Ulrich <josh.m.ulrich at gmail.com> wrote:
>>> 
>>> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org>
>> wrote:
>>>> Hi Peter,
>>>> 
>>>> Sorry if I was not clear.  Perhaps an example will make my point:
>>>> 
>>>>> data(iris)
>>>>> class(iris$Species)
>>>> [1] "factor"
>>>>> write.table(iris,'data/myiris.tab')
>>>>> data(myiris)
>>>>> class(myiris$Species)
>>>> [1] "factor"
>>>>> rm(myiris)
>>>>> options(stringsAsFactors = FALSE)
>>>>> data(myiris)
>>>>> class(myiris$Species)
>>>> [1] "factor"
>>>>> myiris<-read.table("data/myiris.tab",header=TRUE)
>>>>> class(myiris$Species)
>>>> [1] "character"
>>>> 
>>>> I am surprised to find that in the above
>>>>         setting the global option stringsAsFactors = FALSE does NOT effect
>> how Species is being read in by the `data` function
>>>> whereas
>>>>       setting the global option stringsAsFactors = FALSE DOES effect how
>> Species is being read in by read.table
>>>> 
>>>> especially since data is documented as calling read.table.
>>>> 
>>> To be explicit, it's documented as calling read.table(..., header =
>>> TRUE) in this case, but it actually calls read.table(..., header =
>>> TRUE, as.is = FALSE), which results in class(myiris$Species) of
>>> "factor".
>>> 
>>> R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
>>> R> class(myiris$Species)
>>> [1] "factor"
>>> 
>>> So it seems like adding as.is = FALSE to the call in the documentation
>>> would clear this up.
>>> 
>>>> In my opinion, one or the other should change (the behavior of data, or the
>> documentation).
>>>> 
>>>> <bleep> <bleep>,
>>>> 
>>>> ~ Malcolm
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: peter dalgaard [mailto:pdalgd at gmail.com]
>>>>> Sent: Thursday, February 18, 2016 3:32 PM
>>>>> To: Cook, Malcolm <MEC at stowers.org>
>>>>> Cc: r-devel at stat.math.ethz.ch
>>>>> Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
>>>>> 
>>>>> What the <bleep> are you on about? data() does many things, only some
>> of
>>>>> which call read.table() et al., and the ones that do have no special
>> treatment
>>>>> of stringsAsFactors.
>>>>> 
>>>>> -pd
>>>>> 
>>>>>> On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
>>>>>> 
>>>>>> Hiya,
>>>>>> 
>>>>>> Probably been debated elsewhere....
>>>>>> 
>>>>>> I note that R's `data` function does not respect default.stringsAsFactors
>>>>>> 
>>>>>> By my lights, it should, especially as it is documented to call read.table,
>>>>> which DOES respect.
>>>>>> 
>>>>>> Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
>>>>> tp921891p921893.html
>>>>>> 
>>>>>> Compelling.  I have to agree.
>>>>>> 
>>>>>> So, I change my mind.
>>>>>> 
>>>>>> By my lights, `data` should then be documented to NOT respect
>>>>> default.stringsAsFactors.
>>>>>> 
>>>>>> Else?
>>>>>> 
>>>>>> ~Malcolm Cook
>>>>>> 
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> --
>>>>> Peter Dalgaard, Professor,
>>>>> Center for Statistics, Copenhagen Business School
>>>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>>>> Phone: (+45)38153501
>>>>> Office: A 4.23
>>>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
>>> 
>>> --
>>> Joshua Ulrich  |  about.me/joshuaulrich
>>> FOSS Trading  |  www.fosstrading.com
>>> R/Finance 2016 | www.rinfinance.com
>> 
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list