[R] Multiple-Response Analysis: Cleaning of Duplicate Codes

Tue Apr 25 19:28:35 CEST 2017

How about:

d_sample_1 <- floor(d_sample/100) * 100

for (i in 1:nrow(d_sample_1)) {
    d_sample_1[i, duplicated(unlist(d_sample_1[i, ]))] <- NA 
}

B.

> On Apr 25, 2017, at 1:10 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> 
> If I understand you correctly, one way is:
> 
>> z <- rep(LETTERS[1:3],4)
>> z
> [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
>> z[!duplicated(z)]
> [1] "A" "B" "C"
> 
> 
> ?duplicated
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Tue, Apr 25, 2017 at 9:36 AM,  <G.Maubach at weinwolf.de> wrote:
>> Hi All,
>> 
>> in my current project I am working with multiple-response questions
>> (MRSets):
>> 
>> -- Coding --
>> 100 Main Code 1
>> 110 Sub Code 1.1
>> 120 Sub Code 1.2
>> 130 Sub Code 1.3
>> 
>> 200 Main Code 2
>> 210 Sub Code 2.1
>> 220 Sub Code 2.2
>> 230 Sub Code 2.3
>> 
>> 300 Main Code 3
>> 310 Sub Code 3.1
>> 320 Sub Code 3.2
>> 
>> The coding for the variables is to detailed. Therefore I have recoded all
>> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
>> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>> 
>> Now it happens that some respondents get several times the same main code.
>> If the coding was done for respondent 1 with 120 and 130 after recoding
>> the values are 100 and 100. If I count this, it would mean that I weight
>> the multiple values of this respondent by factor 2. This is not my aim. I
>> would like to count the 100 for the respective respondent only once.
>> 
>> Here is my script so far:
>> 
>> # -- cut --
>> 
>> library(expss)
>> 
>> d_sample <-
>>  structure(
>>    list(
>>      c05_01 = c(
>>        110,
>>        110,
>>        130,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        999,
>>        110,
>>        495,
>>        160,
>>        110,
>>        410
>>      ),
>>      c05_02 = c(NA,
>>                 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
>> 170,
>>                 NA, 130),
>>      c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>>                 NA, NA, NA, NA, NA, NA, NA),
>>      c05_04 = c(
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_
>>      ),
>>      c05_05 = c(
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_
>>      )
>>    ),
>>    .Names = c("c05_01",
>>               "c05_02", "c05_03", "c05_04", "c05_05"),
>>    row.names = c(
>>      "1",
>>      "2",
>>      "3",
>>      "4",
>>      "5",
>>      "10",
>>      "11",
>>      "12",
>>      "13",
>>      "14",
>>      "15",
>>      "20",
>>      "21",
>>      "22",
>>      "23",
>>      "24",
>>      "25"
>>    ),
>>    class = "data.frame"
>>  )
>> 
>> c05_xx_r01 <- d_sample %>%
>>  select(starts_with("c05_")) %>%
>>  recode(c(
>>    110 %thru% 195 ~ 100,
>>    210 %thru% 295 ~ 200,
>>    310 %thru% 395 ~ 300,
>>    410 %thru% 495 ~ 400,
>>    510 %thru% 595 ~ 500,
>>    810 %thru% 895 ~ 800,
>>    910 %thru% 999 ~ 900))
>> names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
>> d_sample <- cbind(d_sample, c05_xx_r01)
>> 
>> # -- cut --
>> 
>> I would like to eliminate all duplicates codes, e. g. 100 and 100 for
>> respondents in row 3, 6, 13, 14 and 15 to 100 only once:
>> 
>> # -- cut --
>> d_sample_1 <-
>>  structure(
>>    list(
>>      c05_01 = c(
>>        110,
>>        110,
>>        130,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        110,
>>        999,
>>        110,
>>        495,
>>        160,
>>        110,
>>        410
>>      ),
>>      c05_02 = c(NA,
>>                 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
>> 170,
>>                 NA, 130),
>>      c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>>                 NA, NA, NA, NA, NA, NA, NA),
>>      c05_04 = c(
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_
>>      ),
>>      c05_05 = c(
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_,
>>        NA_real_
>>      ),
>>      c05_01_r01 = c(
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        100,
>>        900,
>>        100,
>>        400,
>>        100,
>>        100,
>>        400
>>      ),
>>      c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
>>                     NA, NA, NA, NA, NA, NA, NA, NA, 100),
>>      c05_03_r01 = c(NA, NA,
>>                     NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA,
>> NA, NA),
>>      c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>                     NA, NA, NA, NA, NA, NA),
>>      c05_05_r01 = c(NA, NA, NA, NA, NA,
>>                     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>    ),
>>    .Names = c(
>>      "c05_01",
>>      "c05_02",
>>      "c05_03",
>>      "c05_04",
>>      "c05_05",
>>      "c05_01_r01",
>>      "c05_02_r01",
>>      "c05_03_r01",
>>      "c05_04_r01",
>>      "c05_05_r01"
>>    ),
>>    row.names = c(
>>      "1",
>>      "2",
>>      "3",
>>      "4",
>>      "5",
>>      "10",
>>      "11",
>>      "12",
>>      "13",
>>      "14",
>>      "15",
>>      "20",
>>      "21",
>>      "22",
>>      "23",
>>      "24",
>>      "25"
>>    ),
>>    class = "data.frame"
>>  )
>> 
>> # -- cut --
>> 
>> How could I achieve this?
>> 
>> Kind regards
>> 
>> Georg
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.