[R] 'merge' function creating duplicate columns names in the output

jim holtman jholtman at gmail.com
Thu Mar 3 23:04:52 CET 2011


The "merge" command is creating duplicate column names in a dataframe
that is the result of the merge.  The following is the 'merge'
command:

x <- merge(invType
    , allocSlots
    , by.x = 'index'
    , by.y = 'indx'
    , all.x = TRUE
    )

The 'invType' dataframe was the result of a previous merge and has the
following column names that are probably causing the problem:
height.x
height.y
height

> str(invType)
'data.frame':   2219 obs. of  30 variables:
 $ loc     : chr  "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
 $ KLN     : int  3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
 $ comm    : int  451 57 560 40 560 39 560 40 560 46 ...
 $ case    : num  7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
 $ desc    : chr  "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH
FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...
 $ height.x: num  7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
 $ length  : num  14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
 $ weight  : num  11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
 $ width   : num  9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
 $ high    : int  5 2 3 3 4 3 3 3 3 3 ...
 $ pqty    : int  65 26 18 45 20 45 21 39 24 30 ...
 $ boh     : int  4372 58 1199 51 836 116 64 312 371 389 ...
 $ awm     : num  694 44.3 53.8 35 0.8 ...
 $ cubes   : num  0.586 1.06 1.821 0.563 1.328 ...
 $ pallet  : num  42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
 $ adm     : num  99.143 6.329 7.686 5 0.114 ...
 $ tie     : num  13 13 6 15 5 15 7 13 8 10 ...
 $ origComm: int  457 57 547 40 541 39 552 40 552 46 ...
 $ days    : num  0.656 6.162 2.342 11.998 216.853 ...
 $ class   : chr  "single" "double" "single" "double" ...
 $ top.x   : logi  TRUE TRUE FALSE TRUE FALSE TRUE ...
 $ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
 $ type.x  : int  2 2 2 2 2 2 2 2 2 2 ...
 $ height.y: num  47 47 47 47 47 47 47 47 47 47 ...
 $ top.y   : logi  FALSE TRUE FALSE TRUE FALSE TRUE ...
 $ noChange: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ type.y  : int  2 2 2 2 2 2 2 2 2 2 ...
 $ depth   : num  48 48 48 48 48 48 48 48 48 48 ...
 $ height  : num  47 47 47 47 47 47 47 47 47 47 ...
 $ index   : int  1 2 3 4 5 6 7 8 9 10 ...


Now the "allocSlots" dataframe also has a column name 'height'

> str(allocSlots)
'data.frame':   2462 obs. of  6 variables:
 $ loc   : chr  "F1AA02" "F1AA12" "F1AA22" "F1AA32" ...
 $ height: num  72 72 72 72 72 72 72 72 72 72 ...
 $ depth : num  48 48 48 48 48 48 48 48 48 48 ...
 $ bay   : chr  "F1AA0" "F1AA0" "F1AA2" "F1AA2" ...
 $ indx  : int  1675 1617 1386 1096 1077 963 816 471 275 259 ...
 $ type  : int  1 1 1 1 1 1 1 1 1 1 ...


Here is the result of the 'merge': (notice that there are now two
'height.x' and 'height.y' columns in the dataframe:

> str(x)
'data.frame':   2219 obs. of  35 variables:
 $ index   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ loc.x   : chr  "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
 $ KLN     : int  3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
 $ comm    : int  451 57 560 40 560 39 560 40 560 46 ...
 $ case    : num  7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
 $ desc    : chr  "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH
FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...
 $ height.x: num  7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
 $ length  : num  14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
 $ weight  : num  11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
 $ width   : num  9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
 $ high    : int  5 2 3 3 4 3 3 3 3 3 ...
 $ pqty    : int  65 26 18 45 20 45 21 39 24 30 ...
 $ boh     : int  4372 58 1199 51 836 116 64 312 371 389 ...
 $ awm     : num  694 44.3 53.8 35 0.8 ...
 $ cubes   : num  0.586 1.06 1.821 0.563 1.328 ...
 $ pallet  : num  42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
 $ adm     : num  99.143 6.329 7.686 5 0.114 ...
 $ tie     : num  13 13 6 15 5 15 7 13 8 10 ...
 $ origComm: int  457 57 547 40 541 39 552 40 552 46 ...
 $ days    : num  0.656 6.162 2.342 11.998 216.853 ...
 $ class   : chr  "single" "double" "single" "double" ...
 $ top.x   : logi  TRUE TRUE FALSE TRUE FALSE TRUE ...
 $ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
 $ type.x  : int  2 2 2 2 2 2 2 2 2 2 ...
 $ height.y: num  47 47 47 47 47 47 47 47 47 47 ...
 $ top.y   : logi  FALSE TRUE FALSE TRUE FALSE TRUE ...
 $ noChange: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ type.y  : int  2 2 2 2 2 2 2 2 2 2 ...
 $ depth.x : num  48 48 48 48 48 48 48 48 48 48 ...
 $ height.x: num  47 47 47 47 47 47 47 47 47 47 ...
 $ loc.y   : chr  "F1KC22" "F1BM34" "F1HC73" "F1FJ65" ...
 $ height.y: num  72 44 72 44 72 44 72 44 72 72 ...
 $ depth.y : num  48 48 48 48 48 48 48 48 48 48 ...
 $ bay     : chr  "F1KC2" "F1BM2" "F1HC7" "F1FJ5" ...
 $ type    : int  1 2 1 2 1 2 1 2 1 1 ...


My workaround is to change one of the "height" to something else to
avoid the problem, but someone else might stumble on the same error.
Should we expect 'merge' to ensure that the column names are unique in
the result?

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list