[Rd] as.data.frame.table() does not recognize default.stringsAsFactors()

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Mar 14 17:40:53 CET 2019


>>>>> peter dalgaard 
>>>>>     on Thu, 14 Mar 2019 16:18:55 +0100 writes:

    > I have no recollection of the original rationale for as.data.frame.table, but I actually think it is fine as it is: 
    > The classifying _factors_ of a crosstable should be factors unless very specifically directed otherwise and that should not depend on the setting of an option that controls the conversion of character data. 

    > For as.data.frame.matrix, in contrast, it is the _content_ of the matrix that is being converted, and it seems much more reasonable to follow the same path as for other character data.

    > -pd

I very strongly agree that as.data.frame.table() should not be
changed to follow a global option.

To the contrary: I've repeatedly mentioned that in my view it
has been a design mistake to allow data.frame() and as.data.frame() be influenced
by a global option
 [and we should've tried harder to keep things purely functional
   (R remaining as closely as possible a "functional language"), 
  e.g. by providing wrapper functions the same way we have such
  wrappers for versions of read.table() with different defaults
  for some of the arguments
 ]

Martin


    >> On 12 Mar 2019, at 21:39 , Mychaleckyj, Josyf C (jcm6t) <jcm6t using virginia.edu> wrote:
    >> 
    >> Reporting a possible inconsistency or bug in handling stringsAsFactors in as.data.frame.table()
    >> 
    >> Here is a simple test
    >> 
    >>> options()$stringsAsFactors
    >> [1] TRUE
    >>> x<-c("a","b","c","a","b")
    >>> d<-as.data.frame(table(x))
    >>> d
    >> x Freq
    >> 1 a    2
    >> 2 b    2
    >> 3 c    1
    >>> class(d$x)
    >> [1] "factor"
    >>> d2<-as.data.frame(table(x),stringsAsFactors=F)
    >>> class(d2$x)
    >> [1] “character"
    >>> options(stringsAsFactors=F)
    >>> options()$stringsAsFactors
    >> [1] FALSE
    >>> d3<-as.data.frame(table(x))
    >>> d3
    >> x Freq
    >> 1 a    2
    >> 2 b    2
    >> 3 c    1
    >>> class(d3$x)
    >> [1] “factor"
    >>> d4<-as.data.frame(table(x),stringsAsFactors=F)
    >>> class(d4$x)
    >> [1] “character"
    >> 
    >> 
    >> # Display the code showing the different  stringsAsFactors handling in table and matrix:
    >> 
    >>> as.data.frame.table
    >> function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
    >> sep = "", base = list(LETTERS))
    >> {
    >> ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
    >> sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
    >> Freq = c(x), row.names = row.names))
    >> names(ex)[3L] <- responseName
    >> eval(ex)
    >> }
    >> <bytecode: 0x28769f8>
    >> <environment: namespace:base>
    >> 
    >>> as.data.frame.matrix
    >> function (x, row.names = NULL, optional = FALSE, make.names = TRUE,
    >> ..., stringsAsFactors = default.stringsAsFactors())
    >> {
    >> d <- dim(x)
    >> nrows <- d[[1L]]
    >> ncols <- d[[2L]]
    >> ic <- seq_len(ncols)
    >> dn <- dimnames(x)
    >> if (is.null(row.names))
    >> row.names <- dn[[1L]]
    >> collabs <- dn[[2L]]
    >> if (any(empty <- !nzchar(collabs)))
    >> collabs[empty] <- paste0("V", ic)[empty]
    >> value <- vector("list", ncols)
    >> if (mode(x) == "character" && stringsAsFactors) {
    >> for (i in ic) value[[i]] <- as.factor(x[, i])
    >> }
    >> else {
    >> for (i in ic) value[[i]] <- as.vector(x[, i])
    >> }
    >> autoRN <- (is.null(row.names) || length(row.names) != nrows)
    >> if (length(collabs) == ncols)
    >> names(value) <- collabs
    >> else if (!optional)
    >> names(value) <- paste0("V", ic)
    >> class(value) <- "data.frame"
    >> if (autoRN)
    >> attr(value, "row.names") <- .set_row_names(nrows)
    >> else .rowNamesDF(value, make.names = make.names) <- row.names
    >> value
    >> }
    >> <bytecode: 0x29995c0>
    >> <environment: namespace:base>
    >> 
    >> 
    >>> sessionInfo()
    >> R version 3.5.2 (2018-12-20)
    >> Platform: x86_64-pc-linux-gnu (64-bit)
    >> Running under: CentOS Linux 7 (Core)
    >> 
    >> Matrix products: default
    >> BLAS: /usr/lib64/libblas.so.3.4.2
    >> LAPACK: /usr/lib64/liblapack.so.3.4.2
    >> 
    >> locale:
    >> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
    >> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
    >> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
    >> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
    >> [9] LC_ADDRESS=C               LC_TELEPHONE=C
    >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
    >> 
    >> attached base packages:
    >> [1] stats     graphics  grDevices utils     datasets  methods   base
    >> 
    >> loaded via a namespace (and not attached):
    >> [1] compiler_3.5.2 tools_3.5.2
    >> 
    >> Thanks,
    >> Joe
    >> 
    >> 
    >> 
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > -- 
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list