[R] turning comma separated string from multiple choices into flags

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Mon Sep 29 17:03:17 CEST 2008


June Kim wrote:
> Hello,
>
> I use google docs' Forms to conduct surveys online. Multiple choices
> questions are coded as comma separated values.
>
> For example,
>
> if the question is like:
>
> 1. What magazines do you currently subscribe to? (you can choose
> multiple choices)
> 1) Fast Company
> 2) Havard Business Review
> 3) Business Week
> 4) The Economist
>
> And if the subject chose 1) and 3), the data is coded as a cell in a
> spreadsheet as,
>
> "Fast Company, Business Week"
>
> I read the data with read.csv into R. To analyze the data, I have to
> change that string into something like flags(indicator variables?).
> That is, there should be 4 variables, of which values are either 1 or
> 0, indicating chosen or not-chosen respectively.
>
> Suppose the data is something like,
>
>   
>> survey1
>>     
>   age                                    favorite_magazine
> 1  29                                         Fast Company
> 2  31                          Fast Company, Business Week
> 3  32 Havard Business Review, Business Week, The Economist
>   
>
> Then I have to chop the string in favorite_magazine column to turn
> that data into something like,
>
>   
>> survey1transformed
>>     
>   age Fast Company Havard Business Review Business Week The Economist
> 1  29            1                      0             0             0
> 2  31            1                      0             1             0
> 3  32            0                      1             1             1
>   
>
> Actually I have many more multiple choice questions in the survey.
>
> What is the easy elegant and natural way in R to do the job?
>   

I'd look into something like as.data.frame(lapply(strings, grep,
x=favorite_magazine, fixed=TRUE)), where strings <- c("Fast Company",
"Havard Business Review", ...).

(I take it that the mechanism is such that you can rely on at least
having everything misspelled in the same way? If it is alternatingly
"Havard" and "Harvard", then things get a bit trickier.)

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list