FW: [R] Newbie struggling with "factors"
Warnes, Gregory R
gregory_r_warnes at groton.pfizer.com
Fri Mar 29 17:56:38 CET 2002
Hint #1, to do any useful transformations on your variables you will
probably need to convert them temporarily into character variables (aka
strings). Do that with
Probably your will want to convert each of the variables that are in this
format into a set of numeric variables. Something like this:
n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
n$OSUSE.Windows <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"1" %in% X ) )
n$OSUSE.Macintosh <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"2" %in% X ) )
n$OSUSE.Unix <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"3" %in% X ) )
Alternatively, if you often have variables like this, you might consider
creating a new object type that extends factor and that includes the
operations that you need.
### Start Sample Code ###
checklist <- function(X, boxnames)
attr(X, "boxnames") <- boxnames
class(X) <- c("checklist","factor")
contains <- function(X, name)
name <- pmatch( name, attr(X,"boxnames" ) )
retval <- sapply( strsplit(X, ",") , function(X) ( name %in% X ) )
numchecked <- function(X)
retval <- sapply( strsplit(X, ","), length )
summary.checklist <- function(x, ...)
sum <- apply( as.matrix(x), 2, sum )
mean <- apply( as.matrix(x), 2, mean )
as.matrix.checklist <- function(x, ...)
sapply( attr(x, "boxnames"), function(YY) contains(x, YY) )
### End Sample Code ##
Here's some examples of using these functions:
> n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
> n$OSUSE <- checklist(n$OSUSE, c("Windows","Macintosh","Unix"))
# Check if OSUSE includes a specific OS
> contains( n$OSUSE, "Windows")
 TRUE TRUE TRUE
> contains( n$OSUSE, "Macintosh")
 FALSE FALSE TRUE
> contains( n$OSUSE, "Unix")
 FALSE TRUE TRUE
# Compute the average number of checked items
 1 2 3
# Create a matrix showing whether each box was checked or not
Windows Macintosh Unix
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE TRUE
[3,] TRUE TRUE TRUE
# Show some summary info
Windows Macintosh Unix
sum 3 1.0000000 2.0000000
mean 1 0.3333333 0.6666667
Of course, you'll want to modify these classes to suit your needs. A little
time up front can help a lot.
If you like, I'll include these classes and any enhancements that you make
in my 'gregmisc' library.
> -----Original Message-----
> From: Tom Arnold [mailto:thomas_l_arnold at yahoo.com]
> Sent: Friday, March 29, 2002 8:59 AM
> To: R
> Subject: [R] Newbie struggling with "factors"
> I am processing some survey results, and my data are
> being read in as "factors". I don't know how to
> process these things in any way.
> To start with, several of the survey questions are
> mulit-choice check boxes on the original (web-based)
> survey, as in "check all that apply".
> These are encoded as numbers. For example, if the
> survey has a question:
> Which operating systems have you used? (Check all that
> [ ]Windows
> [ ]Macinotsh
> [ ]Unix
> ...then the data exported for three different
> responses might look like
> ...where ";" is the field delimiter.
> I use read.table to get the data in. I read all the
> survey data into a table "n" and the field above is
> called "OSUSE". When I query R about the field, it
> tells me it is class "factor"
> > class(n$OSUSE)
>  "factor"
> > mode(n$OSUSE)
>  "numeric"
> I'd like to be able to do some simple things like:
> what is the most common item checked (1, 2, or 3?)
> What is the average number of boxes checked?
> But I can't find any way to manipulate this "factor"
> field. What's the secret?
> Tom Arnold
> Summit Media Partners
> Visit our web site at http://www.summitmediapartners.com
> Yahoo! Greetings - send holiday greetings for Easter, Passover
> r-help mailing list -- Read
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To:
> r-help-request at stat.math.ethz.ch
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help