[Rd] irrelevant warning message

Terry Therneau therneau at mayo.edu
Tue Jan 13 14:28:33 CET 2009

Thanks for the replies:

> and got a warning in all R versions I tried back to 2.4.1.  In 2.3.1 
> this was an error.

  It seems I have egg on my face wrt this point.  A more true synopsis of what I 
saw should have been that 1. I've never noticed this in R before and 2. Until 
recently I did all my modeling in Splus or Bell S, and character vectors always 
worked there.  (My survival routines were always more up to date in Splus 
because that's what I use for the source code.  But conversion from a local cvs 
archive to Rforge is nearly done -- just a survexp.us ratetable issue remains -- 
so R will become my most current version in another day or two.)  Possibly I 
don't have any character variables as covariates in the survival test suite.
> I think R's handling of character vectors has progressed to the point
> where they should be the norm, not the exception.  Maybe others will
> have different views.

  Factors are very useful when there is a small discrete number of levels, and I 
use them moderately often.  For that case, most of the default behavior of 
factors makes perfect sense, e.g., retention of levels.  I'm very sure that 
adding stringsAsFactors to the system options was a good thing, not as sure that 
defaulting it to FALSE is the best thing for all users.  
   In my world most of the data comes from formal processes: clinical trials, 
data bases, large studies that use dedicated keyed entry, etc.  The most common 
character variables are things like id, name, and address for which the factor 
paradym doesn't work, and most of the variables I get that are actually 
'factors' come to me as small integers; I turn them into factors using both the 
levels and labels arguments.  Thus autoconversion is just a PITA. But my world 
is not everyone's. 
   My main complaint with factors has always been the assumption that everything 
should be turned into one.  I fought that battle with Splus.  Defaults behavior 
is often a reflection of the data sets being analysed at the time the code was 
written, and factors reflect the data sets in Chambers & Hastie book.  But then, 
my survival code has some defaults with exactly the same origin...

More information about the R-devel mailing list