[Rd] should `data` respect default.stringsAsFactors()?

Cook, Malcolm MEC at stowers.org
Fri Feb 19 15:54:36 CET 2016


Joshua,

> On Thu, Feb 18, 2016 at 6:03 PM, Cook, Malcolm <MEC at stowers.org> wrote:
 > > Hi Peter,
 > >
 > > Sorry if I was not clear.  Perhaps an example will make my point:
 > >
 > >> data(iris)
 > >> class(iris$Species)
 > > [1] "factor"
 > >> write.table(iris,'data/myiris.tab')
 > >> data(myiris)
 > >> class(myiris$Species)
 > > [1] "factor"
 > >> rm(myiris)
 > >> options(stringsAsFactors = FALSE)
 > >> data(myiris)
 > >> class(myiris$Species)
 > > [1] "factor"
 > >> myiris<-read.table("data/myiris.tab",header=TRUE)
 > >> class(myiris$Species)
 > > [1] "character"
 > >
 > > I am surprised to find that in the above
 > >           setting the global option stringsAsFactors = FALSE does NOT effect
 > how Species is being read in by the `data` function
 > > whereas
 > >         setting the global option stringsAsFactors = FALSE DOES effect how
 > Species is being read in by read.table
 > >
 > > especially since data is documented as calling read.table.
 > >
 > To be explicit, it's documented as calling read.table(..., header =
 > TRUE) in this case, but it actually calls read.table(..., header =
 > TRUE, as.is = FALSE), which results in class(myiris$Species) of
 > "factor".

Aha - makes sense.

 > 
 > R> myiris<-read.table("data/myiris.tab",header=TRUE,as.is=FALSE)
 > R> class(myiris$Species)
 > [1] "factor"
 > 
 > So it seems like adding as.is = FALSE to the call in the documentation
 > would clear this up.

I agree - thanks for digging into the source - you have unearthed the root cause.

~Malcolm

 > > In my opinion, one or the other should change (the behavior of data, or the
 > documentation).
 > >
 > > <bleep> <bleep>,
 > >
 > > ~ Malcolm
 > >
 > >
 > >  > -----Original Message-----
 > >  > From: peter dalgaard [mailto:pdalgd at gmail.com]
 > >  > Sent: Thursday, February 18, 2016 3:32 PM
 > >  > To: Cook, Malcolm <MEC at stowers.org>
 > >  > Cc: r-devel at stat.math.ethz.ch
 > >  > Subject: Re: [Rd] should `data` respect default.stringsAsFactors()?
 > >  >
 > >  > What the <bleep> are you on about? data() does many things, only some
 > of
 > >  > which call read.table() et al., and the ones that do have no special
 > treatment
 > >  > of stringsAsFactors.
 > >  >
 > >  > -pd
 > >  >
 > >  > > On 18 Feb 2016, at 21:25 , Cook, Malcolm <MEC at stowers.org> wrote:
 > >  > >
 > >  > > Hiya,
 > >  > >
 > >  > > Probably been debated elsewhere....
 > >  > >
 > >  > > I note that R's `data` function does not respect default.stringsAsFactors
 > >  > >
 > >  > > By my lights, it should, especially as it is documented to call read.table,
 > >  > which DOES respect.
 > >  > >
 > >  > > Oh, but:  http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-
 > >  > tp921891p921893.html
 > >  > >
 > >  > > Compelling.  I have to agree.
 > >  > >
 > >  > > So, I change my mind.
 > >  > >
 > >  > > By my lights, `data` should then be documented to NOT respect
 > >  > default.stringsAsFactors.
 > >  > >
 > >  > > Else?
 > >  > >
 > >  > > ~Malcolm Cook
 > >  > >
 > >  > > ______________________________________________
 > >  > > R-devel at r-project.org mailing list
 > >  > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > >  >
 > >  > --
 > >  > Peter Dalgaard, Professor,
 > >  > Center for Statistics, Copenhagen Business School
 > >  > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 > >  > Phone: (+45)38153501
 > >  > Office: A 4.23
 > >  > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >  >
 > >
 > > ______________________________________________
 > > R-devel at r-project.org mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-devel
 > 
 > 
 > 
 > --
 > Joshua Ulrich  |  about.me/joshuaulrich
 > FOSS Trading  |  www.fosstrading.com
 > R/Finance 2016 | www.rinfinance.com


More information about the R-devel mailing list