[R] sorting in 'merge'

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Mon Jan 21 12:22:47 CET 2008


jiho wrote:
> [...snip...]
> 	the result is still somehow sorted according to the order of b. I  
> would have expected the output to be:
>
> merge(b,a,sort=F)
>    field1 field2      var2      var1
> 1      2      1 0.2739025 0.5134574
> 2      2      2 0.5147113 0.8063110
> 3      1      2 0.2958369 0.4309419
> 4      1      1 0.3703116 0.8327855
> 5      2      1 0.2739025 0.5134574
>
> Is it possible to get this output (another function similar to merge)?  
> What is the overall reason (if someone knows it) for the current  
> behaviour of merge?
>
>   
Well, the documentation says that the order is "unspecified". That means
that expecting anything specific is likely to be wrong (and even if you
happen to guess correctly, the answer may be wrong next year!).

Merge algorithms generally require sorting of data for efficiency, and
putting things back in the original order (or any other order) adds
complexity. It is not even at all clear what the "original order"
actually means in cases of many-many matching (or alternating one-many
and many-one).

To sort according to the original order of b, I'd just do it explicitly

m <- merge(cbind(id=seq_len(nrow(b)), b), a, sort=F)
m[order(m$id),]

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list