[Rd] irrelevant warning message
therneau at mayo.edu
Tue Jan 13 14:28:33 CET 2009
Thanks for the replies:
> and got a warning in all R versions I tried back to 2.4.1. In 2.3.1
> this was an error.
It seems I have egg on my face wrt this point. A more true synopsis of what I
saw should have been that 1. I've never noticed this in R before and 2. Until
recently I did all my modeling in Splus or Bell S, and character vectors always
worked there. (My survival routines were always more up to date in Splus
because that's what I use for the source code. But conversion from a local cvs
archive to Rforge is nearly done -- just a survexp.us ratetable issue remains --
so R will become my most current version in another day or two.) Possibly I
don't have any character variables as covariates in the survival test suite.
> I think R's handling of character vectors has progressed to the point
> where they should be the norm, not the exception. Maybe others will
> have different views.
Factors are very useful when there is a small discrete number of levels, and I
use them moderately often. For that case, most of the default behavior of
factors makes perfect sense, e.g., retention of levels. I'm very sure that
adding stringsAsFactors to the system options was a good thing, not as sure that
defaulting it to FALSE is the best thing for all users.
In my world most of the data comes from formal processes: clinical trials,
data bases, large studies that use dedicated keyed entry, etc. The most common
character variables are things like id, name, and address for which the factor
paradym doesn't work, and most of the variables I get that are actually
'factors' come to me as small integers; I turn them into factors using both the
levels and labels arguments. Thus autoconversion is just a PITA. But my world
is not everyone's.
My main complaint with factors has always been the assumption that everything
should be turned into one. I fought that battle with Splus. Defaults behavior
is often a reflection of the data sets being analysed at the time the code was
written, and factors reflect the data sets in Chambers & Hastie book. But then,
my survival code has some defaults with exactly the same origin...
More information about the R-devel