(PR#1608) merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)

David James dj@research.bell-labs.com
Wed, 29 May 2002 09:32:43 -0400


I do use data frames for storing character data, fully aware that I'm
stretching their intended use.  Data frames came about in the context
of modeling software (see "Statistical Models in S," the "White book"
by Chambers and Hastie, eds).  Originally, the primary use of data
frames was for holding the data given to the model fitting functions,
and thus the classes of objects that LM, GLM, GAM, Tree, etc.,
required are simple ones (numeric and factors -- note that character
vectors are not well suited for fitting models).  Very soon after,
people began to include other types of objects (Terry Therneau's
censored/survival classes, among others, come to my mind).  So the
behaviour of the data.frame class has evolved into what we are
currently using, and some of its apparent "idiosincracies" make
perfect sense in light of its original intended purpose.

It has been argued before that we may need other more general
container classes to hold other "tabular" data (e.g., contigency
tables, data from relational databases) that don't require the 
restriction that data frames have traditionally imposed.  Of course
is not obvious to me that introducing yet another set of classes
is necessarily a good thing --- a lot of care and thought would have
to be put into the effort to ensure that any new container classes (or
any other type, for that matter) are well designed and with a clear
purpose, just like data frames were well-designed for the purpose
of holding data for fitting models.

David Kane  <David Kane wrote:
> Prof Brian D Ripley writes:
>  > I have already patiently explained it to you.  It is a side issue of
>  > subscripting of data frames converting character columns to factor.
>  > I have also given you a workaround.
> 
> Yes, and many thanks for the patient explanations! Indeed, thanks to you and
> the rest of R core for a simply amazing piece of software. Even though our
> budget would allow us to use most any statistics package around, we use R
> because it seems to us to be the best.
> 
> My only comment on the workaround (using I() to create vectors of class AsIs)
> is that it is largely undocumented. I was concerned that, in this case,
> undocumented meant "discouraged from use and possibly not present in future
> versions." In any event, we will now do as you suggested.
> 
>  > As I said before, this is a consequence of the general rules.  Data frames
>  > are not designed to have character columns, and those who insist on using
>  > them must make themselves aware of the consequences.
> 
> Ahh! It had never been clear to me that data frames are not "designed to have
> character columns." Of course, now that I carefully (re)read the documentation,
> I see that this is the case. My comment here is that R certainly provides
> enough rope (colClasses as "character" in read.table, for example) for
> unsuspecting users like me to hang themselves on this point. Indeed, I would
> wager that the vast majority of users (even some members of R core?) have
> dataframes with character variables in them. My question is: Why aren't
> dataframes "designed" to have character columns? This would seem to be a
> desirable feature . . . but perhaps I am misunderstanding what a dataframe
> really "is". Or, perhaps the answer is: "Patches are accepted." ;-)
> 
> Thanks again,
> 
> Dave Kane
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
David A. James
Statistics Research, Room 2C-253            Phone:  (908) 582-3082       
Bell Labs, Lucent Technologies              Fax:    (908) 582-3340
Murray Hill, NJ 09794-0636
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._