[R] Newbie struggling with "factors"

Jonathan Baron baron at cattell.psych.upenn.edu
Fri Mar 29 16:10:00 CET 2002

On 03/29/02 05:58, Tom Arnold wrote:
>I am processing some survey results, and my data are
>being read in as "factors". I don't know how to
>process these things in any way.

You can convert them to numbers with as.numeric(), but that
doesn't seem to be your immediate problem.  Factors are like what
some other programs call "categorical variables."  They ordering
doesn't mean anything.  They are just labels.  Sometimes that
_is_ exactly what you want.

You can also use various options on read.table() to control how
things are coded.  (See the help on it.)  But that doesn't seem
to be your problem either.

>To start with, several of the survey questions are
>mulit-choice check boxes on the original (web-based)
>survey, as in "check all that apply".
>These are encoded as numbers. For example, if the
>survey has a question:
>Which operating systems have you used? (Check all that
>[ ]Windows
>[ ]Macinotsh
>[ ]Unix
>...then the data exported for three different
>responses might look like
>...where ";" is the field delimiter. 

I assue you have told read.table() that ";" is the delimiter,
with the appropriate option setting.  But this would mean, I
think, that you would have some fields that look like "1,2,3".
This is probably why you are getting factors: "1,2,3" is not a
number, so R assumes it is part of a factor.  To extract the
numbers from things like this, you might have to use things like
substr() and strsplit().

>I use read.table to get the data in. I read all the
>survey data into a table "n" and the field above is
>called "OSUSE". When I query R about the field, it
>tells me it is class "factor"
>> class(n$OSUSE)
>[1] "factor"
>> mode(n$OSUSE)
>[1] "numeric"

I'm surprised at this.  I would think it would be character.

>I'd like to be able to do some simple things like:
>what is the most common item checked (1, 2, or 3?)
>What is the average number of boxes checked?
>But I can't find any way to manipulate this "factor"
>field. What's the secret?

Using the various string manipulation commands, possibly also
with apply() or loops.

But, more generally, you might rethink the way you do your
questionnaires or have your data reported.  I do this kind of
research, and I try to avoid checkboxes or select boxes for just
this reason.  (I tend to use buttons and then write JavaScript
code to read the buttons and turn their results into something
nice and regular.)  Time saved in writing the web page is lost
when you try to analyze the data.  You might also try various
pre-processing of your data.

In particular, what you might aim for is to have a data file for
each respondent with all commas or all semicolon.  A response
like "1,2" would be represented as "1,2,0".  That is, you would
pad the data to fill up the right number of fields.

Now, again, you can, if you want, handle all this with string
manipulation in R, and perhaps you can develop general tools that
you can use over and over.  But, if you are just starting, you
might consider the alternatives too.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list