[R] Coercing character to factor

Thomas Lumley thomas at biostat.washington.edu
Wed Mar 8 23:39:15 CET 2000


On Wed, 8 Mar 2000, Marc Feldesman wrote:

> I just downloaded version 1.0.0 and several binary libraries (VR, rpart, 
> norm, stataread) - WinNT version.  I then converted a file from Stata 6.0 
> to R format by using the stataread library.  The file converts perfectly 
> and I was able to use the VR function lda on the dataframe without 
> difficulty.  I then tried to use the same dataframe with RPART.  The model 
> statement:
> 
> test.rp<-rpart(genus~x+y+z+a+b+c, data=mydata) fails with the following error:
> 
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>          invalid variable type
> 
> (the identical model statement works perfectly in lda)
> 
> I've traced the error to how RPART (or R) deals with the dependent variable 
> "genus", which is converted from a Stata file to an R file as a "character" 
> variable.

Yes. stataread reads Stata string variables as character.  I think this is
the right thing to do, since Stata recommends that you use numeric
variables with labels if you really just have factors and generally
doesn't encourage the use of strings.

model.frame doesn't allow character variables.  It would be possible for
model.frame to coerce characters to factors, but it currently doesn't. We
would certainly recommend that you explicitly coerce strings to factors
rather than having it happen automatically, but it may be that model.frame
should handle strings as a fallback position (perhaps with a warning).

rpart is completely blameless :)


	-thomas

Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle
	

> 
> The model statement works fine if I do:
> 
> test.rp<-rpart(as.factor(genus)~x+y+z+a+b+c, data=mydata)
> 
> or
> 
> mydata[,2]<-as.factor(mydata[,2])
> test.rp<-rpart(genus~x+y+z+a+b+c, data=mydata)
> 
> Is this an R, RPART, or stataread issue?  Where did I think I read that R 
> coerced character variables to factors if the context called for factor 
> variables?
> 
> 
> =====================
> Dr. Marc R. Feldesman
> Professor and Chairman
> Anthropology Department
> Portland State University
> 1721 SW Broadway
> Portland, Oregon 97201
> email:  feldesmanm at pdx.edu
> phone:  503-725-3081
> fax:    503-725-3905
> http://odin.cc.pdx.edu/~h1mf
> ======================
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

Thomas Lumley
Assistant Professor, Biostatistics
University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list