[R] Merging variables

David Winsemius dwinsemius at comcast.net
Mon Jun 6 14:51:01 CEST 2016


You loop through each row but during each iteration you assign a value to the entire "mismatch" column. The last value assigned was 1.

Sent from my iPhone

> On Jun 6, 2016, at 8:29 AM, G.Maubach at weinwolf.de wrote:
> 
> Hi All,
> 
> I merged two datasets:
> 
> ds_merge1 <- merge(x = ds_bw_customer_4_match, y = 
> ds_zww_customer_4_match,
>  by.x = "customer", by.y = "customer",
>  all.x = TRUE, all.y = FALSE)
> 
> R created a new dataset with the variables customer.x and customer.y. I 
> would like to merge these two variable back together. I wrote a little 
> function (code can be run) for it:
> 
> -- cut --
> 
> customer.x <- c("Miller", "Smith", NA,    "Bird", NA)
> customer.y <- c("Miller",  NA,     "Doe", "Fish", NA)
> ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
> 
> t_merge_variables <-
>  function(dataset,
>           var1,
>           var2,
>           merged_var) {
> 
>    # Initialize
>    dataset[[merged_var]] = rep(NA, nrow(dataset))
>    dataset[["mismatch"]] = rep(NA, nrow(dataset))
> 
>    for (i in 1:nrow(dataset)) {
> 
>      # Check 1: var1 missing, var2 missing
>      if (is.na(dataset[[i, var1]]) &
>          is.na(dataset[[i, var2]])) {
>        dataset[["mismatch"]] <- 1  # var1 & var2 are missing
> 
>      # Check 2: var1 filled, var2 missing
>      } else if (!is.na(dataset[[i, var1]]) &
>                 is.na(dataset[[i, var2]])) {
>        dataset[[i, merged_var]] <- dataset[[i, var1]]
>        dataset[["mismatch"]] <- 0
> 
>      # Check 3: var1 missing, var2 filled
>      } else if (is.na(dataset[[i, var1]]) &
>                 !is.na(dataset[i, var2])) {
>        dataset[[i, merged_var]] <- dataset[[i, var2]]
>        dataset[["mismatch"]] <-  0
> 
>      # Check 4: var1 == var2
>      } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
>      dataset[[i, merged_var]] <- dataset[[i, var1]]
>      dataset[["mismatch"]] <- 0
> 
>      # Leftover: var1 != var2
>      } else {
>        dataset[[i, merged_var]] <- NA
>        dataset[["mismatch"]] <- 2  # var1 != var2
>      }  # end if
>    }  # end for
>    return(dataset)
> }
> 
> ds_var_merge1 <- t_merge_variables(dataset = ds_test,
>  var1 = "customer.x",
>  var2 = "customer.y",
>  merged_var = "customer")
> 
> ds_var_merge1
> 
> -- cut --
> 
> It is executed without error but delivers the wrong values in the variable 
> "mismatch". This variable is always 1 although it should be NA, 1 or 2 
> respectively.
> 
> Can you tell me why the variable is not correctly set?
> 
> Kind regards
> 
> Georg
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list