[R] Convert factor to numeric vector of labels

John Kane jrkrideau at yahoo.ca
Wed Aug 15 16:04:51 CEST 2007


My reason for setting stringsAsFactors = FALSE is more
that I really dislike having R convert what I "think"
are character variables to factors when I import data.


I suspect that it takes quite a few new users by
surprise that what they had intended to be a character
variable has become a factor. And it can take a long
time to track down the problem if you're a newbie.
-------------------------------------------------

A quick (overly simple) example where I had intended
the data in the second column to be character. 

Original data found at
http://ca.geocities.com/jrkrideau/R/facts.txt

1, b
1, b
3, b
3, b
4, a
4, a
3, a

options(stringsAsFactors = TRUE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt")
 ; df[,2]
[1]  b  b  b  a  a  a
Levels:  a  b


options(stringsAsFactors = FALSE)

df  <-
read.csv("http://ca.geocities.com/jrkrideau/R/facts.txt")
 ; df[,2]
[1] " b" " b" " b" " a" " a" " a"
-----------------------------------------------------

There are probably good reasons for setting the
default either way and while currently, I am strongly
of the FALSE persuation I can see some serious
problems changing the default, particularly when most
existing code will assume TRUE.  

It might be that a  "Why are my character variables
turning into factors"  as a compliment to "How do I
convert factors to numeric" in the FAQ would be
sufficient.  As it is the reader knows what seems to
have happened but there is no clue as to why or how
this is happening.

If there are enough problems in importing numeric as
factors a note about the default might be worthwhile
in both FAQ entries  since it seems to indicate that
this is not a rare problem.



--- Matthew Keller <mckellercran at gmail.com> wrote:

> Hi all,
> 
> If we, the R community, are endeavoring to make R
> user friendly
> (gasp!), I think that one of the first places to
> start would be in
> setting stringsAsFactors = FALSE. Several times I've
> run into
> instances of folks decrying R's "rediculous usage of
> memory" in
> reading data, only to come to find out that these
> folks were
> unknowingly importing certain columns as factors.
> The fix is easy once
> you know it, but it isn't obvious to new users, and
> I'd bet that it
> turns some % of people off of the program. Factors
> are not used often
> enough to justify this default behavior in my
> opinion. When factors
> are used, the user knows to treat the variable as a
> factor, and so it
> can be done on a case-by-case (or should I say
> variable-by-variable?)
> basis.
> 
> Is this a default that should be changed?
> 
> Matt
> 
> 

> > This is one of R's rather _endearing_  little
> > idiosyncrasies. I ran into it a while ago.
> >
>
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html
> >
> >
> > For some reason, possibly historical, the option
> > "stringAsFactors" is set to TRUE.
> >
> > As Prof Ripley says FAQ 7.10 will tell you
> > as.numeric(as.character(f)) # for a one-off
> conversion
> >
> > >From Gabor Grothendieck  A one-off solution for a
> > complete data.frame
> >
> > DF <- data.frame(let = letters[1:3], num = 1:3,
> >  stringsAsFactors = FALSE)
> >
> > str(DF)  # to see what has happened.
> >
> > You can reset the option globally, see below. 
> However
> > you might want to read Gabor Grothendieck's
> comment
> > about this in the thread referenced above since it
> > could cause problems if you transfer files alot.
> >
> > Personally I went with the global option since I
> don't
> > tend to transfer programs to other people and I
> was
> > getting tired of tracking down errors in my
> programs
> > caused by numeric and character variables suddenly
> > deciding to become factors.
> >
> > >From Steven Tucker:
> >
> > You can also this option globally with
> >  options(stringsAsFactors = TRUE)  # in
> > \library\base\R\Rprofile
> >
> > --- Falk Lieder <falk.lieder at googlemail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I have imported a data file to R. Unfortunately
> R
> > > has interpreted some
> > > numeric variables as factors. Therefore I want
> to
> > > reconvert these to numeric
> > > vectors whose values are the factor levels'
> labels.
> > > I tried
> > > as.numeric(<factor>),
> > > but it returns a vector of factor levels (i.e.
> > > 1,2,3,...) instead of labels
> > > (i.e. 0.71, 1.34, 2.61,…).
> > > What can I do instead?
> > >
> > > Best wishes, Falk
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> -- 
> Matthew C Keller
> Postdoctoral Fellow
> Virginia Institute for Psychiatric and Behavioral
> Genetics
>



More information about the R-help mailing list