[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)

Wed Apr 26 11:57:50 CEST 2017

Hi Bert,

many thanks for your reply. I appreciate your help a lot.

I would like to do the operation (= finding the duplicates) row-wise.

During this night a solution showed up in my dreams :) Instead of using 
duplicates() to flag and filter the values I could use unique instead with 
the same result. I tested:

# -- cut --

apply(X = c05_xx_r01, MARGIN = 1, unique)

# -- cut --

This finds the unique values for each row. That is nice but lacks the 
requirement that I need a dataframe with a set of variables back that is 
as long as the total amount of unique values for the complete 
data.frame/matrix or the amount of variable of the original data.frame 
respectively.

The result of the above operation gives a list instead of a data.frame due 
to the fact that the amount of resulting values vary from 1 to 7. 
Therefore no data.frame but a list is returned.

I search the web for a solution and found:

http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame

The complete solution would then look like:

# -- cut --

library(stringi)
library(tidyverse)
my_list <- apply(c05_xx_r01, MARGIN = 1, unique)
my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE)
# DONE !

# -- cut --

All-in-all thanks again for your help.

Kind regards

Georg

P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 
1) does not do the job, cause this looks for unique combinations of values 
on all columns. But that is not the desired outcome.

Von:    Bert Gunter <bgunter.4567 at gmail.com>
An:     G.Maubach at weinwolf.de, 
Kopie:  R-help <r-help at r-project.org>
Datum:  25.04.2017 19:10
Betreff:        Re: [R] Multiple-Response Analysis: Cleaning of Duplicate 
Codes

If I understand you correctly, one way is:

> z <- rep(LETTERS[1:3],4)
> z
 [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
> z[!duplicated(z)]
[1] "A" "B" "C"

?duplicated

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Apr 25, 2017 at 9:36 AM,  <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> in my current project I am working with multiple-response questions
> (MRSets):
>
> -- Coding --
> 100 Main Code 1
> 110 Sub Code 1.1
> 120 Sub Code 1.2
> 130 Sub Code 1.3
>
> 200 Main Code 2
> 210 Sub Code 2.1
> 220 Sub Code 2.2
> 230 Sub Code 2.3
>
> 300 Main Code 3
> 310 Sub Code 3.1
> 320 Sub Code 3.2
>
> The coding for the variables is to detailed. Therefore I have recoded 
all
> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>
> Now it happens that some respondents get several times the same main 
code.
> If the coding was done for respondent 1 with 120 and 130 after recoding
> the values are 100 and 100. If I count this, it would mean that I weight
> the multiple values of this respondent by factor 2. This is not my aim. 
I
> would like to count the 100 for the respective respondent only once.
>
> Here is my script so far:
>
> # -- cut --
>
> library(expss)
>
> d_sample <-
>   structure(
>     list(
>       c05_01 = c(
>         110,
>         110,
>         130,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         999,
>         110,
>         495,
>         160,
>         110,
>         410
>       ),
>       c05_02 = c(NA,
>                  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>                  NA, 130),
>       c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>                  NA, NA, NA, NA, NA, NA, NA),
>       c05_04 = c(
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_
>       ),
>       c05_05 = c(
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_
>       )
>     ),
>     .Names = c("c05_01",
>                "c05_02", "c05_03", "c05_04", "c05_05"),
>     row.names = c(
>       "1",
>       "2",
>       "3",
>       "4",
>       "5",
>       "10",
>       "11",
>       "12",
>       "13",
>       "14",
>       "15",
>       "20",
>       "21",
>       "22",
>       "23",
>       "24",
>       "25"
>     ),
>     class = "data.frame"
>   )
>
> c05_xx_r01 <- d_sample %>%
>   select(starts_with("c05_")) %>%
>   recode(c(
>     110 %thru% 195 ~ 100,
>     210 %thru% 295 ~ 200,
>     310 %thru% 395 ~ 300,
>     410 %thru% 495 ~ 400,
>     510 %thru% 595 ~ 500,
>     810 %thru% 895 ~ 800,
>     910 %thru% 999 ~ 900))
> names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
> d_sample <- cbind(d_sample, c05_xx_r01)
>
> # -- cut --
>
> I would like to eliminate all duplicates codes, e. g. 100 and 100 for
> respondents in row 3, 6, 13, 14 and 15 to 100 only once:
>
> # -- cut --
> d_sample_1 <-
>   structure(
>     list(
>       c05_01 = c(
>         110,
>         110,
>         130,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         110,
>         999,
>         110,
>         495,
>         160,
>         110,
>         410
>       ),
>       c05_02 = c(NA,
>                  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>                  NA, 130),
>       c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>                  NA, NA, NA, NA, NA, NA, NA),
>       c05_04 = c(
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_
>       ),
>       c05_05 = c(
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_,
>         NA_real_
>       ),
>       c05_01_r01 = c(
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         100,
>         900,
>         100,
>         400,
>         100,
>         100,
>         400
>       ),
>       c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
>                      NA, NA, NA, NA, NA, NA, NA, NA, 100),
>       c05_03_r01 = c(NA, NA,
>                      NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, 
NA,
> NA, NA),
>       c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>                      NA, NA, NA, NA, NA, NA),
>       c05_05_r01 = c(NA, NA, NA, NA, NA,
>                      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
>     ),
>     .Names = c(
>       "c05_01",
>       "c05_02",
>       "c05_03",
>       "c05_04",
>       "c05_05",
>       "c05_01_r01",
>       "c05_02_r01",
>       "c05_03_r01",
>       "c05_04_r01",
>       "c05_05_r01"
>     ),
>     row.names = c(
>       "1",
>       "2",
>       "3",
>       "4",
>       "5",
>       "10",
>       "11",
>       "12",
>       "13",
>       "14",
>       "15",
>       "20",
>       "21",
>       "22",
>       "23",
>       "24",
>       "25"
>     ),
>     class = "data.frame"
>   )
>
> # -- cut --
>
> How could I achieve this?
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.