[R] unexpected sort order with merge

Johann Hibschman jhibschman at gmail.com
Wed Apr 6 19:37:34 CEST 2011


`merge` lists sorted as if by character, not by the actual class of the
by-columns.

> tmp <- merge(data.frame(f=ordered(c("a","b","b","a","b"),
                                    levels=c("b","a")),
                          x=1:5),
               data.frame(f=ordered(c("a","b"),
                                    levels=c("b","a")),
                          y=c(10,20)))
> tmp
  f x  y
1 a 1 10
2 a 4 10
3 b 2 20
4 b 3 20
5 b 5 20

> tmp[order(tmp$f),]
  f x  y
3 b 2 20
4 b 3 20
5 b 5 20
1 a 1 10
2 a 4 10

I expected the second order, not the first.

I actually ran into this issue when merging zoo yearmon columns, but
that adds a package dependency.  In that context, I observed different
behavior depending on whether I had one key or two:

> library(zoo)
> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, bar=10*1:6)
> merge(d1,d2)
      date icpn foo bar
1 Apr 2000  500   4  40
2 Feb 2000  500   2  20
3 Jan 2000  500   1  10
4 Jun 2000  500   6  60
5 Mar 2000  500   3  30
6 May 2000  500   5  50

> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), bar=10*1:6)
> merge(d1,d2)
      date foo bar
1 Jan 2000   1  10
2 Feb 2000   2  20
3 Mar 2000   3  30
4 Apr 2000   4  40
5 May 2000   5  50
6 Jun 2000   6  60

The first example appears to sort by the name of the date, not by the
actual date value.

The documentation of `merge` says the sort is "lexicographic", but I
assumed that was in the cartesian-product sense, not in some
convert-everything-to-character sense.

Is this behavior expected?

Thanks,
Johann


P.S. 

> sessionInfo()
R version 2.10.1 (2009-12-14) 
x86_64-unknown-linux-gnu 

locale:
[1] C

attached base packages:
[1] grid      splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] ggplot2_0.8.8   reshape_0.8.3   Rauto_1.0       plyr_1.1       
[5] zoo_1.6-4       Hmisc_3.7-0     survival_2.35-8 ascii_0.7      
[9] proto_0.3-8    

loaded via a namespace (and not attached):
[1] cluster_1.12.1  digest_0.4.2    lattice_0.17-26 tools_2.10.1



More information about the R-help mailing list