[Rd] as.data.frame.table() does not recognize default.stringsAsFactors()

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Fri Mar 15 14:31:08 CET 2019


My point was that, in a table, the row and columns usually have a well-defined order. If you convert the table to data frame form, typically in order to fit a Poisson GLM, you do want to preserve that order, and not have the levels converted to a locale-dependent alphabetical order in your analyses. Or at least, if you do want conversion to character, you should say so very explicitly. That is the way it currently works: You can override, just not via the global option.

Notice also that it is very easy to do as.character(factor) if you need it, whereas it is rather more painful to convert a character vector to a factor with level names determined by the dimension names of the appropriate extent of the original table.

-pd

> On 15 Mar 2019, at 13:13 , Therneau, Terry M., Ph.D. via R-devel <r-devel using r-project.org> wrote:
> 
> I have to disagree with both Peter and Martin on this.
> 
> The underneath issue is that the automatic conversion of characters to factors by the 
> data.frame functions was the single most egregious design blunder in the Statistical 
> Models in S book, and we are still living with it.  The stringsAsFactors option was a 
> compromise to let users opt out of that mistake (one I had to fight hard for).    In that 
> light I read Peter's defense as "but in this case we really DO know better than the user, 
> and won't let them opt out", and Martin's as "they shouldn't have been able to opt out in 
> the first place, so weaken it at every opportunity".
> 
> I generally agree that global options should be minimal.  But if one exists, let's be 
> consistent and listen to it.
> 
> (Footnote: In the Mayo Biostat group, stringsAsFactors=FALSE is the recommended global 
> option for all users.  It's a pure cost/productivity thing.  We work on thousands of data 
> sets in a year, and the errors and misunderstandings that silent conversions generate far 
> outweigh any benefits. )
> 
> Terry T.
> 
> 
> On 3/15/19 6:00 AM, r-devel-request using r-project.org wrote:
>>> I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is:
>>> The classifying_factors_  of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data.
>> 
>>> For as.data.frame.matrix, in contrast, it is the_content_  of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.
>> 
>>> -pd
>> 
>> I very strongly agree that as.data.frame.table() should not be
>> changed to follow a global option.
>> 
>> To the contrary: I've repeatedly mentioned that in my view it
>> has been a design mistake to allow data.frame() and as.data.frame() be influenced
>> by a global option
>>  [and we should've tried harder to keep things purely functional
>>    (R remaining as closely as possible a "functional language"),
>>   e.g. by providing wrapper functions the same way we have such
>>   wrappers for versions of read.table() with different defaults
>>   for some of the arguments
>>  ]
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list