[Rd] Unexpected result of as.character() and unlist() appliedto a data frame

Heinz Tuechler tuechler at gmx.at
Wed Mar 28 13:16:49 CEST 2007


At 17:25 27.03.2007 +0200, Martin Maechler wrote:
>>>>>> "Herve" == Herve Pages <hpages at fhcrc.org>
>>>>>>     on Mon, 26 Mar 2007 20:48:33 -0700 writes:
>
>    Herve> Hi,
>    >> dd <- data.frame(A=c("b","c","a"), B=3:1) dd
>    Herve>   A B 1 b 3 2 c 2 3 a 1
>    >> unlist(dd)
>    Herve> A1 A2 A3 B1 B2 B3 2 3 1 3 2 1
>
>    Herve> Someone else might get something different. It all
>    Herve> depends on the values of its 'stringsAsFactors'  option:
>
>yes, and I don't like that (last) fact either.
>IMO, an option should never be allowed to influence such a basic
>function as  data.frame().
>
>I know I would have had time earlier to start discussing this,
>but for some (probably good) reasons, I didn't get to it at the
>time. 
>As Andy comments, everything is behaving as it should / is documented,
>including the  'stringsAsFactors' option;
>but personally, I really would want to consider changing
>the default for  data.frame()s stringAsFactors back (as
>pre-R-2.4.0) to 'TRUE' instead of  default.stringsAsFactors()
>which is a smart version of getOption("stringsAsFactors"). 
>I find it ok ("acceptable") if its influencing  read.table()
>but feel differently for data.frame().
>
>Martin
>
Martin!

I see the problem with options influencing "such a basic function as
data.frame().", but in my view the difficulty starts earlier. In my
understanding data.frame() is _the_ basic way to store empirical source
data in R and I found the earlier default behaviour, to change character
variables to factors, problematic.
If changing character variables to factors were only an internal process,
not visible to the user, I would not mind, but to include a character
variable in a data frame and get a factor out of it, is somewhat disturbing.
A naive user like me was especially confused by the fact that I could read
an SPSS file with spss.get (default: charfactor=FALSE) and get a character
variable in a data.frame as a character variable but then putting it in a
different data.frame it changed to factor.
I would wish a data.frame() function that behaves as a "data container"
with the idea of rows(=cases) and columns(=variables) but without changing
the mode/class of the objects.

Heinz

>
>
>
>
>    >> dd2 <- data.frame(A=c("b","c","a"), B=3:1,
>    >>                   stringsAsFactors=FALSE)
>    >> dd2
>    Herve>   A B 1 b 3 2 c 2 3 a 1
>    >> unlist(dd2)
>    Herve>  A1 A2 A3 B1 B2 B3 "b" "c" "a" "3" "2" "1"
>
>    Herve> Same thing with as.character:
>
>    >> as.character(dd)
>    Herve> [1] "c(2, 3, 1)" "c(3, 2, 1)"
>    >> as.character(dd2)
>    Herve> [1] "c(\"b\", \"c\", \"a\")" "c(3, 2, 1)"
>
>    Herve> Bug or "feature"?
>
>    Herve> Note that as.character applied directly on dd$A
>    Herve> doesn't have this "feature":
>
>    >> as.character(dd$A)
>    Herve> [1] "b" "c" "a"
>    >> as.character(dd2$A)
>    Herve> [1] "b" "c" "a"
>
>    Herve> Cheers, H.
>
>    Herve> ______________________________________________
>    Herve> R-devel at r-project.org mailing list
>    Herve> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list