[R] Recoding a multiple response question into a series of 1, 0 variables

Philippe Grosjean phgrosjean at sciviews.org
Tue Jun 8 09:24:20 CEST 2004


Hello,
Here is a slightly more sophisticate and fully vectorized, answer.

RecodeChoices <- function(mat) {
	# Make sure mat is a matrix (in case it is a data.frame)
	mat <- as.matrix(mat)

	# Get dimensions of the matrix
	Dim <- dim(mat)
	Nr <- Dim[1]
	Nc <- Dim[2]

	# Flatten it into a vector, but by row (need to transpose first!)
	mat <- t(mat)
	dim(mat) <- NULL

	# Offset is a vector of offsets to make locations unique in vector mat
	# (a solution to avoid loops, see Jonathan Baron's answer)
	Offset <- sort(rep(0:(Nr - 1) * Nc, Nc))

	# Initialize a vector of results of the same size with 0's
	res <- rep(0, Nr * Nc)

	# Now replace locations pointed by (mat + Offset) by 1 in res
	res[mat + Offset] <- 1

	# Transform res into a matrix of same size of mat, by row
	res <- matrix(res, nrow = Nr, byrow = TRUE)

	# Return the result
	return(res)
}

# Now your example:
A <- matrix(c(4,  2, NA, NA, NA,
              1,  3,  4,  5, NA,
              3,  2, NA, NA, NA), nrow = 3, byrow = TRUE)
A
RecodeChoices(A)

Depending on the use you make of this, it is perhaps preferable to recode it
as a boolean (as.numeric() would give you the c(1, 0) as above easily). To
do this, just replace:
res <- rep(0, Nr * Nc)   by   res <- rep(FALSE, Nr * Nc)
and:
res[mat + Offset] <- TRUE

You may also consider to make it factors... and to finalize this function,
you should add code to collect row and column names from mat and apply them
to res, and perhaps transforn res into a data.frame if mat was a data.frame
itself.

Best,

Philippe Grosjean

.......................................................<?}))><....
 ) ) ) ) )
( ( ( ( (   Prof. Philippe Grosjean
\  ___   )
 \/ECO\ (   Numerical Ecology of Aquatic Systems
 /\___/  )  Mons-Hainaut University, Pentagone
/ ___  /(   8, Av. du Champ de Mars, 7000 Mons, Belgium
 /NUM\/  )
 \___/\ (   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
       \ )  email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )  SciViews project coordinator (http://www.sciviews.org)
( ( ( ( (
...................................................................

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Jonathan Baron
Sent: Tuesday, 08 June, 2004 04:45
To: Greg Blevins
Cc: R-Help
Subject: Re: [R] Recoding a multiple response question into a series of
1,0 variables


On 06/07/04 21:28, Greg Blevins wrote:
>Hello R folks.
>
>1) The question that generated the data, which I call Qx:
>Which of the following 5 items have you performed in the past month?
(multipe
>response)
>
>2) How the data is coded in my current dataframe:
>The first item that a person selected is coded under a field called
Qxfirst; the
>second selected under Qxsecond, etc.  For the first Person, the NAs mean
that that
>person only selected two of the five items.
>
>Hypothetical data is shown
>
>                    Qxfirst    Qxsecond    Qxthird    Qxfourth    Qxfifth
>Person1        4            2                NA            NA            NA
>Person2        1            3                4               5
NA
>Person3        3            2                NA            NA            NA
>
>3) How I want the data to be be coded:
>
>I want each field to be one of the five items and I want each field to
contain a 1 or
>0 code--1 if they mentioned the item, 0 otherwise.
>
>Given the above data, the new fields would look as follows:
>
>                    Item1    Item2        Item3            Item4
Item5
>Person1        0            1               0                1
0
>Person2        1            0               1                1
1
>Person3        0            1               1                0
0

Here is an idea:
X <- c(4,5,NA,NA,NA) # one row
Y <- rep(NA,5) # an empty row
Y[X] <- 1

Y is now
NA NA NA 1 1
which is what you want.

So you need to do this on each row and then convert the NAs to
0s.  So first create an empty data frame, the same size as your
original one X, like my Y.  Callit Y.  Then a loop?  (I can't
think of a better way just now, like with mapply.)

for (i in [whatever]) Y[i][X[i]] <- 1

(Not tested.)  Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:            http://www.sas.upenn.edu/~baron
R search page:        http://finzi.psych.upenn.edu/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list